dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/15] Fair DRM scheduler
@ 2025-07-08  9:51 Tvrtko Ursulin
  2025-07-08  9:51 ` [PATCH v6 01/15] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
                   ` (15 more replies)
  0 siblings, 16 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08  9:51 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Tvrtko Ursulin,
	Christian König, Danilo Krummrich, Leo Liu, Matthew Brost,
	Philipp Stanner, Pierre-Eric Pelloux-Prayer, Michel Dänzer

As a summary, the new scheduling algorithm is insipired by the original Linux
CFS and so far no scheduling regressions have been found. There are improvements
in fairness and scheduling of interactive clients when running in parallel with
a heavy GPU load (for example Pierre-Eric has one viewperf medical test which
shows a nice improvement with amdgpu).

On the high level main advantages of the series are:

 1. Scheduling quality - schedules better than FIFO, solves priority starvation.
 2. Code simplification - no more multiple run queues and multiple algorithms.
 3. Virtual GPU time based scheduling enables relatively simple addition
    of a scheduling cgroup controller in the future.

There is a little bit more detailed write up on the motivation and results in
the form of a blog post which may be easier to read:
https://blogs.igalia.com/tursulin/fair-er-drm-gpu-scheduler/

First patches add some unit tests which allow for easy evaluation of scheduling
behaviour against different client submission patterns. From there onwards it is
hopefully a natural progression of cleanups, enablers, adding the fair policy,
and finally removing FIFO and RR and simplifying the code base due no more need
for multiple run queues.

As a headline result I have tested three simultaneous clients on the Steam Deck:

One instance of a deferredmultisampling Vulkan demo running with low priority,
one normal priority instance of the same demo, and the Unigine Heaven benchmark.

With the FIFO scheduler we can see that the low priority client is completely
starved and the GPU time distribution between the other two clients is uneven:

https://people.igalia.com/tursulin/drm-sched-fair/fifo-starvation.png

Switching to the fair scheduler, GPU time distribution is almost equal and the
low priority client does get a small share of the GPU:

https://people.igalia.com/tursulin/drm-sched-fair/fair-no-starvation.png

Moving onto the synthetic submission patterns, they are about two simultaneous
clients which broadly cover the following categories:

 * Deep queue clients
 * Hogs versus interactive
 * Priority handling

Lets look at the results:

1. Two normal priority deep queue clients.

These ones submit one second worth of 8ms jobs. As fast as they can, no
dependencies etc. There is no difference in runtime between FIFO and fair but
the latter allows both clients to progress with work more evenly:

https://people.igalia.com/tursulin/drm-sched-fair/normal-normal.png

(X axis is time, Y is submitted queue-depth, hence lowering of qd corresponds
  with work progress for both clients, tested with both schedulers separately.)

2. Same two clients but one is now low priority.

https://people.igalia.com/tursulin/drm-sched-fair/normal-low.png

Normal priority client is a solid line, low priority dotted. We can see how FIFO
completely starves the low priority client until the normal priority is fully
done. Only then the low priority client gets any GPU time.

In constrast, fair scheduler allows some GPU time to the low priority client.

3. Same clients but now high versus normal priority.

Similar behaviour as in the previous one with normal a bit less de-prioritised
relative to high, than low was against normal.

https://people.igalia.com/tursulin/drm-sched-fair/high-normal.png

4. Heavy load vs interactive client.

Heavy client emits a 75% GPU load in the format of 3x 2.5ms jobs followed by a
2.5ms wait. Interactive client emits a 10% GPU load in the format of 1x 1ms job
followed by a 9ms wait.

This simulates an interactive graphical client used on top of a relatively heavy
background load but no GPU oversubscription.

Graphs show the interactive client only and from now on, instead of looking at
the client's queue depth, we look at its "fps".

https://people.igalia.com/tursulin/drm-sched-fair/heavy-interactive.png

We can see that fair scheduler allows a higher fps for the interactive client
which is good.

5. An even heavier load vs interactive client.

This one is oversubscribing the GPU by submitting 4x 50ms jobs and waiting for
only one microsecond before repeating the cycle. Interactive client is thje same
10% as above.

https://people.igalia.com/tursulin/drm-sched-fair/veryheavy-interactive.png

Here the difference is even more dramatic with fair scheduler enabling ~3x the
framerate for the interactive client.

6. Low priority GPU hog versus heavy-interactive.

Low priority client: 3x 2.5ms jobs client followed by a 0.5ms wait.
Interactive client: 1x 0.5ms job followed by a 10ms wait.

https://people.igalia.com/tursulin/drm-sched-fair/lowhog-interactive.png

Slight win for the fair scheduler but could be just noise.

7. Last set of test scenarios will have three subgroups.

In all cases we have two interactive (synchronous, single job at a time) clients
with a 50% "duty cycle" GPU time usage.

Client 1: 1.5ms job + 1.5ms wait (aka short bursty)
Client 2: 2.5ms job + 2.5ms wait (aka long bursty)

a) Both normal priority.

https://people.igalia.com/tursulin/drm-sched-fair/5050-short.png
https://people.igalia.com/tursulin/drm-sched-fair/5050-long.png

Both schedulers favour the higher frequency duty cycle with fair giving it a
little bit more which should be good for interactivity.

b) Normal vs low priority.

https://people.igalia.com/tursulin/drm-sched-fair/5050-normal-low-normal.png
https://people.igalia.com/tursulin/drm-sched-fair/5050-normal-low-low.png

Fair scheduler gives a bit more GPU time to the normal priority client which is
again good.

c) High vs normal priority.

https://people.igalia.com/tursulin/drm-sched-fair/5050-high-normal-high.png
https://people.igalia.com/tursulin/drm-sched-fair/5050-high-normal-normal.png

Again, fair scheduler gives a bit more share to the higher priority client.

As before, I am looking for feedback, ideas for what other kinds of submission
scenarios to test, testing on different GPUs and of course reviews.

v2:
 * Fixed many rebase errors.
 * Added some new patches.
 * Dropped single shot dependecy handling.

v3:
 * Added scheduling quality unit tests.
 * Refined a tiny bit by adding some fairness.
 * Dropped a few patches for now.

v4:
 * Replaced deadline with fair!
 * Refined scheduling quality unit tests.
 * Pulled one cleanup patch earlier.
 * Fixed "drm/sched: Avoid double re-lock on the job free path".

v5:
 * Rebase on top of latest upstream DRM scheduler changes.
 * Kerneldoc fixup.
 * Improve commit message justification for one patch. (Philipp)
 * Add comment in drm_sched_alloc_wq. (Christian)

v6:
 * Rebase for "drm/sched: De-clutter drm_sched_init" getting merged.
 * Avoid NULL rq dereference from a bad rebase. (Maira)
 * Added some kerneldoc throughout. (Maira)
 * Removed some lockdep annotations not belonging to one patch. (Maira)
 * Use dma_fence_is_signaled in "drm/sched: Avoid double re-lock on the job free path". (Maira, Philipp)

Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
CC: Leo Liu <Leo.Liu@amd.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Michel Dänzer <michel.daenzer@mailbox.org>

Tvrtko Ursulin (15):
  drm/sched: Add some scheduling quality unit tests
  drm/sched: Add some more scheduling quality unit tests
  drm/sched: Avoid double re-lock on the job free path
  drm/sched: Consolidate drm_sched_job_timedout
  drm/sched: Consolidate drm_sched_rq_select_entity_rr
  drm/sched: Implement RR via FIFO
  drm/sched: Consolidate entity run queue management
  drm/sched: Move run queue related code into a separate file
  drm/sched: Free all finished jobs at once
  drm/sched: Account entity GPU time
  drm/sched: Remove idle entity from tree
  drm/sched: Add fair scheduling policy
  drm/sched: Remove FIFO and RR and simplify to a single run queue
  drm/sched: Queue all free credits in one worker invocation
  drm/sched: Embed run queue singleton into the scheduler

 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c       |  27 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h       |   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h     |   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c   |   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c       |   8 +-
 drivers/gpu/drm/scheduler/Makefile            |   2 +-
 drivers/gpu/drm/scheduler/sched_entity.c      | 131 ++-
 drivers/gpu/drm/scheduler/sched_fence.c       |   2 +-
 drivers/gpu/drm/scheduler/sched_internal.h    | 146 +++-
 drivers/gpu/drm/scheduler/sched_main.c        | 548 +++---------
 drivers/gpu/drm/scheduler/sched_rq.c          | 221 +++++
 drivers/gpu/drm/scheduler/tests/Makefile      |   3 +-
 .../gpu/drm/scheduler/tests/tests_scheduler.c | 815 ++++++++++++++++++
 include/drm/gpu_scheduler.h                   |  31 +-
 15 files changed, 1396 insertions(+), 565 deletions(-)
 create mode 100644 drivers/gpu/drm/scheduler/sched_rq.c
 create mode 100644 drivers/gpu/drm/scheduler/tests/tests_scheduler.c

-- 
2.48.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v6 01/15] drm/sched: Add some scheduling quality unit tests
  2025-07-08  9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
@ 2025-07-08  9:51 ` Tvrtko Ursulin
  2025-07-09 11:16   ` kernel test robot
  2025-07-08  9:51 ` [PATCH v6 02/15] drm/sched: Add some more " Tvrtko Ursulin
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08  9:51 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Tvrtko Ursulin,
	Christian König, Danilo Krummrich, Matthew Brost,
	Philipp Stanner, Pierre-Eric Pelloux-Prayer

To make evaluating different scheduling policies easier (no need for
external benchmarks) and perfectly repeatable, lets add some synthetic
workloads built upon mock scheduler unit test infrastructure.

Focus is on two parallel clients (two threads) submitting different job
patterns and logging their progress and some overall metrics. This is
repeated for both scheduler credit limit 1 and 2.

Example test output:

  Normal and low:
                    pct1 cps1 qd1;  pct2 cps2 qd2
        +     0ms:   0     0    0;   0     0    0
        +   104ms: 100  1240  112; 100  1240  125
        +   209ms: 100     0   99; 100     0  125
        +   313ms: 100     0   86; 100     0  125
        +   419ms: 100     0   73; 100     0  125
        +   524ms: 100     0   60; 100     0  125
        +   628ms: 100     0   47; 100     0  125
        +   731ms: 100     0   34; 100     0  125
        +   836ms: 100     0   21; 100     0  125
        +   939ms: 100     0    8; 100     0  125
        +  1043ms:               ; 100     0  120
        +  1147ms:               ; 100     0  107
        +  1252ms:               ; 100     0   94
        +  1355ms:               ; 100     0   81
        +  1459ms:               ; 100     0   68
        +  1563ms:               ; 100     0   55
        +  1667ms:               ; 100     0   42
        +  1771ms:               ; 100     0   29
        +  1875ms:               ; 100     0   16
        +  1979ms:               ; 100     0    3
    0: prio=normal sync=0 elapsed_ms=1015ms (ideal_ms=1000ms) cycle_time(min,avg,max)=134,222,978 us latency_time(min,avg,max)=134,222,978
us
    1: prio=low sync=0 elapsed_ms=2009ms (ideal_ms=1000ms) cycle_time(min,avg,max)=134,215,806 us latency_time(min,avg,max)=134,215,806 us

There we have two clients represented in the two respective columns, with
their progress logged roughly every 100 milliseconds. The metrics are:

 - pct - Percentage progress of the job submit part
 - cps - Cycles per second
 - qd  - Queue depth - number of submitted unfinished jobs

The cycles per second metric is inherent to the fact that workload
patterns are a data driven cycling sequence of:

 - Submit 1..N jobs
 - Wait for Nth job to finish (optional)
 - Sleep (optional)
 - Repeat from start

In this particular example we have a normal priority and a low priority
clients both spamming the scheduler with 8ms jobs with no sync and no
sleeping. Hence they build a very deep queues and we can see how the low
priority client is completely starved until the normal finishes.

Note that the PCT and CPS metrics are irrelevant for "unsync" clients
since they manage to complete all of their cycles instantaneously.

A different example would be:

  Heavy and interactive:
                    pct1 cps1 qd1;  pct2 cps2 qd2
        +     0ms:   0     0    0;   0     0    0
        +   106ms:   5    40    3;   5    40    0
        +   209ms:   9    40    0;   9    40    0
        +   314ms:  14    50    3;  14    50    0
        +   417ms:  18    40    0;  18    40    0
        +   522ms:  23    50    3;  23    50    0
        +   625ms:  27    40    0;  27    40    1
        +   729ms:  32    50    0;  32    50    0
        +   833ms:  36    40    1;  36    40    0
        +   937ms:  40    40    0;  40    40    0
        +  1041ms:  45    50    0;  45    50    0
        +  1146ms:  49    40    1;  49    40    1
        +  1249ms:  54    50    0;  54    50    0
        +  1353ms:  58    40    1;  58    40    0
        +  1457ms:  62    40    0;  62    40    1
        +  1561ms:  67    50    0;  67    50    0
        +  1665ms:  71    40    1;  71    40    0
        +  1772ms:  76    50    0;  76    50    0
        +  1877ms:  80    40    1;  80    40    0
        +  1981ms:  84    40    0;  84    40    0
        +  2085ms:  89    50    0;  89    50    0
        +  2189ms:  93    40    1;  93    40    0
        +  2293ms:  97    40    0;  97    40    1

In this case client one is submitting 3x 2.5ms jobs, waiting for the 3rd
and then sleeping for 2.5ms (in effect causing 75% GPU load, minus the
overheads). Second client is submitting 1ms jobs, waiting for each to
finish and sleeping for 9ms (effective 10% GPU load). Here we can see
the PCT and CPS reflecting real progress.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/tests/Makefile      |   3 +-
 .../gpu/drm/scheduler/tests/tests_scheduler.c | 631 ++++++++++++++++++
 2 files changed, 633 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/scheduler/tests/tests_scheduler.c

diff --git a/drivers/gpu/drm/scheduler/tests/Makefile b/drivers/gpu/drm/scheduler/tests/Makefile
index 5bf707bad373..9ec185fbbc15 100644
--- a/drivers/gpu/drm/scheduler/tests/Makefile
+++ b/drivers/gpu/drm/scheduler/tests/Makefile
@@ -2,6 +2,7 @@
 
 drm-sched-tests-y := \
         mock_scheduler.o \
-        tests_basic.o
+        tests_basic.o \
+        tests_scheduler.o
 
 obj-$(CONFIG_DRM_SCHED_KUNIT_TEST) += drm-sched-tests.o
diff --git a/drivers/gpu/drm/scheduler/tests/tests_scheduler.c b/drivers/gpu/drm/scheduler/tests/tests_scheduler.c
new file mode 100644
index 000000000000..b66321ef7abe
--- /dev/null
+++ b/drivers/gpu/drm/scheduler/tests/tests_scheduler.c
@@ -0,0 +1,631 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Valve Corporation */
+
+#include <linux/delay.h>
+#include <linux/kthread.h>
+#include <linux/ktime.h>
+
+#include "sched_tests.h"
+
+/*
+ * DRM scheduler scheduler tests exercise load balancing decisions ie. entity
+ * selection logic.
+ */
+
+static int drm_sched_scheduler_init(struct kunit *test)
+{
+	struct drm_mock_scheduler *sched;
+
+	sched = drm_mock_sched_new(test, MAX_SCHEDULE_TIMEOUT);
+	sched->base.credit_limit = 1;
+
+	test->priv = sched;
+
+	return 0;
+}
+
+static int drm_sched_scheduler_init2(struct kunit *test)
+{
+	struct drm_mock_scheduler *sched;
+
+	sched = drm_mock_sched_new(test, MAX_SCHEDULE_TIMEOUT);
+	sched->base.credit_limit = 2;
+
+	test->priv = sched;
+
+	return 0;
+}
+
+static void drm_sched_scheduler_exit(struct kunit *test)
+{
+	struct drm_mock_scheduler *sched = test->priv;
+
+	drm_mock_sched_fini(sched);
+}
+
+static void drm_sched_scheduler_queue_overhead(struct kunit *test)
+{
+	struct drm_mock_scheduler *sched = test->priv;
+	struct drm_mock_sched_entity *entity;
+	const unsigned int job_us = 1000;
+	const unsigned int jobs = 1000;
+	const unsigned int total_us = jobs * job_us;
+	struct drm_mock_sched_job *job, *first;
+	ktime_t start, end;
+	bool done;
+	int i;
+
+	/*
+	 * Deep queue job at a time processing (single credit).
+	 *
+	 * This measures the overhead of picking and processing a job at a time
+	 * by comparing the ideal total "GPU" time of all submitted jobs versus
+	 * the time actually taken.
+	 */
+
+	KUNIT_ASSERT_EQ(test, sched->base.credit_limit, 1);
+
+	entity = drm_mock_sched_entity_new(test,
+					   DRM_SCHED_PRIORITY_NORMAL,
+					   sched);
+
+	for (i = 0; i <= jobs; i++) {
+		job = drm_mock_sched_job_new(test, entity);
+		if (i == 0)
+			first = job; /* Extra first job blocks the queue */
+		else
+			drm_mock_sched_job_set_duration_us(job, job_us);
+		drm_mock_sched_job_submit(job);
+	}
+
+	done = drm_mock_sched_job_wait_scheduled(first, HZ);
+	KUNIT_ASSERT_TRUE(test, done);
+
+	start = ktime_get();
+	i = drm_mock_sched_advance(sched, 1); /* Release the queue */
+	KUNIT_ASSERT_EQ(test, i, 1);
+
+	done = drm_mock_sched_job_wait_finished(job,
+						usecs_to_jiffies(total_us) * 5);
+	end = ktime_get();
+	KUNIT_ASSERT_TRUE(test, done);
+
+	pr_info("Expected %uus, actual %lldus\n",
+		total_us,
+		ktime_to_us(ktime_sub(end, start)));
+
+	drm_mock_sched_entity_free(entity);
+}
+
+static void drm_sched_scheduler_ping_pong(struct kunit *test)
+{
+	struct drm_mock_sched_job *job, *first, *prev = NULL;
+	struct drm_mock_scheduler *sched = test->priv;
+	struct drm_mock_sched_entity *entity[2];
+	const unsigned int job_us = 1000;
+	const unsigned int jobs = 1000;
+	const unsigned int total_us = jobs * job_us;
+	ktime_t start, end;
+	bool done;
+	int i;
+
+	/*
+	 * Two entitites in inter-dependency chain.
+	 *
+	 * This measures the overhead of picking and processing a job at a time,
+	 * where each job depends on the previous one from the diffferent
+	 * entity, by comparing the ideal total "GPU" time of all submitted jobs
+	 * versus the time actually taken.
+	 */
+
+	KUNIT_ASSERT_EQ(test, sched->base.credit_limit, 1);
+
+	for (i = 0; i < ARRAY_SIZE(entity); i++)
+		entity[i] = drm_mock_sched_entity_new(test,
+						      DRM_SCHED_PRIORITY_NORMAL,
+						      sched);
+
+	for (i = 0; i <= jobs; i++) {
+		job = drm_mock_sched_job_new(test, entity[i & 1]);
+		if (i == 0)
+			first = job; /* Extra first job blocks the queue */
+		else
+			drm_mock_sched_job_set_duration_us(job, job_us);
+		if (prev)
+			drm_sched_job_add_dependency(&job->base,
+						     dma_fence_get(&prev->base.s_fence->finished));
+		drm_mock_sched_job_submit(job);
+		prev = job;
+	}
+
+	done = drm_mock_sched_job_wait_scheduled(first, HZ);
+	KUNIT_ASSERT_TRUE(test, done);
+
+	start = ktime_get();
+	i = drm_mock_sched_advance(sched, 1); /* Release the queue */
+	KUNIT_ASSERT_EQ(test, i, 1);
+
+	done = drm_mock_sched_job_wait_finished(job,
+						usecs_to_jiffies(total_us) * 5);
+	end = ktime_get();
+	KUNIT_ASSERT_TRUE(test, done);
+
+	pr_info("Expected %uus, actual %lldus\n",
+		total_us,
+		ktime_to_us(ktime_sub(end, start)));
+
+	for (i = 0; i < ARRAY_SIZE(entity); i++)
+		drm_mock_sched_entity_free(entity[i]);
+}
+
+static struct kunit_case drm_sched_scheduler_overhead_tests[] = {
+	KUNIT_CASE_SLOW(drm_sched_scheduler_queue_overhead),
+	KUNIT_CASE_SLOW(drm_sched_scheduler_ping_pong),
+	{}
+};
+
+static struct kunit_suite drm_sched_scheduler_overhead = {
+	.name = "drm_sched_scheduler_overhead_tests",
+	.init = drm_sched_scheduler_init,
+	.exit = drm_sched_scheduler_exit,
+	.test_cases = drm_sched_scheduler_overhead_tests,
+};
+
+struct drm_sched_client_params {
+	enum drm_sched_priority priority;
+	unsigned int job_cnt;
+	unsigned int job_us;
+	unsigned int wait_us;
+	bool sync;
+};
+
+struct drm_sched_test_params {
+	const char *description;
+	struct drm_sched_client_params client[2];
+};
+
+static const struct drm_sched_test_params drm_sched_cases[] = {
+	{
+		.description = "Normal and normal",
+		.client[0] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 1,
+			.job_us = 8000,
+			.wait_us = 0,
+			.sync = false,
+		},
+		.client[1] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 1,
+			.job_us = 8000,
+			.wait_us = 0,
+			.sync = false,
+		},
+	},
+	{
+		.description = "Normal and low",
+		.client[0] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 1,
+			.job_us = 8000,
+			.wait_us = 0,
+			.sync = false,
+		},
+		.client[1] = {
+			.priority = DRM_SCHED_PRIORITY_LOW,
+			.job_cnt = 1,
+			.job_us = 8000,
+			.wait_us = 0,
+			.sync = false,
+		},
+	},
+	{
+		.description = "High and normal",
+		.client[0] = {
+			.priority = DRM_SCHED_PRIORITY_HIGH,
+			.job_cnt = 1,
+			.job_us = 8000,
+			.wait_us = 0,
+			.sync = false,
+		},
+		.client[1] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 1,
+			.job_us = 8000,
+			.wait_us = 0,
+			.sync = false,
+		},
+	},
+	{
+		.description = "High and low",
+		.client[0] = {
+			.priority = DRM_SCHED_PRIORITY_HIGH,
+			.job_cnt = 1,
+			.job_us = 8000,
+			.wait_us = 0,
+			.sync = false,
+		},
+		.client[1] = {
+			.priority = DRM_SCHED_PRIORITY_LOW,
+			.job_cnt = 1,
+			.job_us = 8000,
+			.wait_us = 0,
+			.sync = false,
+		},
+	},
+	{
+		.description = "50 and 50",
+		.client[0] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 1,
+			.job_us = 1500,
+			.wait_us = 1500,
+			.sync = true,
+		},
+		.client[1] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 1,
+			.job_us = 2500,
+			.wait_us = 2500,
+			.sync = true,
+		},
+	},
+	{
+		.description = "50 and 50 low",
+		.client[0] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 1,
+			.job_us = 1500,
+			.wait_us = 1500,
+			.sync = true,
+		},
+		.client[1] = {
+			.priority = DRM_SCHED_PRIORITY_LOW,
+			.job_cnt = 1,
+			.job_us = 2500,
+			.wait_us = 2500,
+			.sync = true,
+		},
+	},
+	{
+		.description = "50 high and 50",
+		.client[0] = {
+			.priority = DRM_SCHED_PRIORITY_HIGH,
+			.job_cnt = 1,
+			.job_us = 1500,
+			.wait_us = 1500,
+			.sync = true,
+		},
+		.client[1] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 1,
+			.job_us = 2500,
+			.wait_us = 2500,
+			.sync = true,
+		},
+	},
+	{
+		.description = "Low hog and interactive",
+		.client[0] = {
+			.priority = DRM_SCHED_PRIORITY_LOW,
+			.job_cnt = 3,
+			.job_us = 2500,
+			.wait_us = 500,
+			.sync = false,
+		},
+		.client[1] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 1,
+			.job_us = 500,
+			.wait_us = 10000,
+			.sync = true,
+		},
+	},
+	{
+		.description = "Heavy and interactive",
+		.client[0] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 3,
+			.job_us = 2500,
+			.wait_us = 2500,
+			.sync = true,
+		},
+		.client[1] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 1,
+			.job_us = 1000,
+			.wait_us = 9000,
+			.sync = true,
+		},
+	},
+	{
+		.description = "Very heavy and interactive",
+		.client[0] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 4,
+			.job_us = 50000,
+			.wait_us = 1,
+			.sync = true,
+		},
+		.client[1] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 1,
+			.job_us = 1000,
+			.wait_us = 9000,
+			.sync = true,
+		},
+	},
+};
+
+static void
+drm_sched_desc(const struct drm_sched_test_params *params, char *desc)
+{
+	strscpy(desc, params->description, KUNIT_PARAM_DESC_SIZE);
+}
+
+KUNIT_ARRAY_PARAM(drm_sched_scheduler_two_clients,
+		  drm_sched_cases,
+		  drm_sched_desc);
+
+struct test_client_stats {
+	unsigned long min_us;
+	unsigned long max_us;
+	unsigned long avg_us;
+};
+
+struct test_client {
+	struct kunit *test;
+
+	struct drm_mock_sched_entity	*entity;
+
+	struct kthread_worker	*worker;
+	struct kthread_work	work;
+
+	unsigned int id;
+	ktime_t duration;
+
+	struct drm_sched_client_params params;
+
+	ktime_t ideal_duration;
+	unsigned int cycles;
+	unsigned int cycle;
+	ktime_t	start;
+	ktime_t	end;
+	bool done;
+
+	struct test_client_stats cycle_time;
+	struct test_client_stats latency_time;
+};
+
+static void
+update_stats(struct test_client_stats *stats, unsigned int n, unsigned long us)
+{
+	if (us > stats->max_us)
+		stats->max_us = us;
+	if (us < stats->min_us)
+		stats->min_us = us;
+	stats->avg_us = DIV_ROUND_UP(n * stats->avg_us + us, n + 1);
+}
+
+static void drm_sched_client_work(struct kthread_work *work)
+{
+	struct test_client *client = container_of(work, typeof(*client), work);
+	const long sync_wait = MAX_SCHEDULE_TIMEOUT;
+	unsigned int cycle, work_us, period_us;
+	struct drm_mock_sched_job *job = NULL;
+
+	work_us = client->params.job_cnt * client->params.job_us;
+	period_us = work_us + client->params.wait_us;
+	client->cycles = DIV_ROUND_UP(ktime_to_us(client->duration), period_us);
+	client->ideal_duration = us_to_ktime(client->cycles * period_us);
+
+	client->start = ktime_get();
+
+	for (cycle = 0; cycle < client->cycles; cycle++) {
+		unsigned int batch;
+		unsigned long us;
+		ktime_t t;
+
+		if (READ_ONCE(client->done))
+			break;
+
+		t = ktime_get();
+		for (batch = 0; batch < client->params.job_cnt; batch++) {
+			job = drm_mock_sched_job_new(client->test,
+						     client->entity);
+			drm_mock_sched_job_set_duration_us(job,
+							   client->params.job_us);
+			drm_mock_sched_job_submit(job);
+		}
+
+		if (client->params.sync)
+			drm_mock_sched_job_wait_finished(job, sync_wait);
+
+		t = ktime_sub(ktime_get(), t);
+		us = ktime_to_us(t);
+		update_stats(&client->cycle_time, cycle, us);
+		if (ktime_to_us(t) >= (long)work_us)
+			us = ktime_to_us(t) - work_us;
+		else if (WARN_ON_ONCE(client->params.sync))
+			us = 0;
+		update_stats(&client->latency_time, cycle, us);
+		WRITE_ONCE(client->cycle, cycle);
+
+		if (READ_ONCE(client->done))
+			break;
+
+		if (client->params.wait_us)
+			fsleep(client->params.wait_us);
+		else
+			cond_resched();
+	}
+
+	client->done = drm_mock_sched_job_wait_finished(job, sync_wait);
+	client->end = ktime_get();
+}
+
+static const char *prio_str(enum drm_sched_priority prio)
+{
+	switch (prio) {
+	case DRM_SCHED_PRIORITY_KERNEL:
+		return "kernel";
+	case DRM_SCHED_PRIORITY_LOW:
+		return "low";
+	case DRM_SCHED_PRIORITY_NORMAL:
+		return "normal";
+	case DRM_SCHED_PRIORITY_HIGH:
+		return "high";
+	default:
+		return "???";
+	}
+}
+
+static void drm_sched_scheduler_two_clients_test(struct kunit *test)
+{
+	const struct drm_sched_test_params *params = test->param_value;
+	struct drm_mock_scheduler *sched = test->priv;
+	struct test_client client[2] = { };
+	unsigned int prev_cycle[2] = { };
+	unsigned int i, j;
+	ktime_t start;
+
+	/*
+	 * Same job stream from from two clients.
+	 */
+
+	for (i = 0; i < ARRAY_SIZE(client); i++)
+		client[i].entity =
+			drm_mock_sched_entity_new(test,
+						  params->client[i].priority,
+						  sched);
+
+	for (i = 0; i < ARRAY_SIZE(client); i++) {
+		client[i].test = test;
+		client[i].id = i;
+		client[i].duration = ms_to_ktime(1000);
+		client[i].params = params->client[i];
+		client[i].cycle_time.min_us = ~0UL;
+		client[i].latency_time.min_us = ~0UL;
+		client[i].worker =
+			kthread_create_worker(0, "%s-%u", __func__, i);
+		if (IS_ERR(client[i].worker)) {
+			for (j = 0; j < i; j++)
+				kthread_destroy_worker(client[j].worker);
+			KUNIT_FAIL(test, "Failed to create worker!\n");
+		}
+
+		kthread_init_work(&client[i].work, drm_sched_client_work);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(client); i++)
+		kthread_queue_work(client[i].worker, &client[i].work);
+
+	/*
+	 * The clients (workers) can be a mix of async (deep submission queue),
+	 * sync (one job at a time), or something in between. Therefore it is
+	 * difficult to display a single metric representing their progress.
+	 *
+	 * Each struct drm_sched_client_params describes the actual submission
+	 * pattern which happens in the following steps:
+	 *  1. Submit N jobs
+	 *  2. Wait for last submitted job to finish
+	 *  3. Sleep for U micro-seconds
+	 *  4. Goto 1. for C cycles
+	 *
+	 * Where number of cycles is calculated to match the target client
+	 * duration from the respective struct drm_sched_test_params.
+	 *
+	 * To asses scheduling behaviour what we output for both clients is:
+	 *  - pct: Percentage progress of the jobs submitted
+	 *  - cps: "Cycles" per second (where one cycle is one 1.-4. above)
+	 *  -  qd: Number of outstanding jobs in the client/entity
+	 */
+
+	start = ktime_get();
+	pr_info("%s:\n\t            pct1 cps1 qd1;  pct2 cps2 qd2\n",
+		params->description);
+	while (!READ_ONCE(client[0].done) || !READ_ONCE(client[1].done)) {
+		unsigned int pct[2], qd[2], cycle[2], cps[2];
+
+		for (i = 0; i < ARRAY_SIZE(client); i++) {
+			qd[i] = spsc_queue_count(&client[i].entity->base.job_queue);
+			cycle[i] = READ_ONCE(client[i].cycle);
+			cps[i] = DIV_ROUND_UP(1000 * (cycle[i] - prev_cycle[i]),
+					      100);
+			if (client[i].cycles)
+				pct[i] = DIV_ROUND_UP(100 * (1 + cycle[i]),
+						      client[i].cycles);
+			else
+				pct[i] = 0;
+			prev_cycle[i] = cycle[i];
+		}
+
+		if (READ_ONCE(client[0].done))
+			pr_info("\t+%6lldms:               ; %3u %5u %4u\n",
+				ktime_to_ms(ktime_sub(ktime_get(), start)),
+				pct[1], cps[1], qd[1]);
+		else if (READ_ONCE(client[1].done))
+			pr_info("\t+%6lldms: %3u %5u %4u;\n",
+				ktime_to_ms(ktime_sub(ktime_get(), start)),
+				pct[0], cps[0], qd[0]);
+		else
+			pr_info("\t+%6lldms: %3u %5u %4u; %3u %5u %4u\n",
+				ktime_to_ms(ktime_sub(ktime_get(), start)),
+				pct[0], cps[0], qd[0],
+				pct[1], cps[1], qd[1]);
+		msleep(100);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(client); i++) {
+		kthread_flush_work(&client[i].work);
+		kthread_destroy_worker(client[i].worker);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(client); i++)
+		KUNIT_ASSERT_TRUE(test, client[i].done);
+
+	for (i = 0; i < ARRAY_SIZE(client); i++) {
+		pr_info("    %u: prio=%s sync=%u elapsed_ms=%lldms (ideal_ms=%lldms) cycle_time(min,avg,max)=%lu,%lu,%lu us latency_time(min,avg,max)=%lu,%lu,%lu us",
+			i,
+			prio_str(params->client[i].priority),
+			params->client[i].sync,
+			ktime_to_ms(ktime_sub(client[i].end, client[i].start)),
+			ktime_to_ms(client[i].ideal_duration),
+			client[i].cycle_time.min_us,
+			client[i].cycle_time.avg_us,
+			client[i].cycle_time.max_us,
+			client[i].latency_time.min_us,
+			client[i].latency_time.avg_us,
+			client[i].latency_time.max_us);
+		drm_mock_sched_entity_free(client[i].entity);
+	}
+}
+
+static const struct kunit_attributes drm_sched_scheduler_two_clients_attr = {
+	.speed = KUNIT_SPEED_SLOW,
+};
+
+static struct kunit_case drm_sched_scheduler_two_clients_tests[] = {
+	KUNIT_CASE_PARAM_ATTR(drm_sched_scheduler_two_clients_test,
+			      drm_sched_scheduler_two_clients_gen_params,
+			      drm_sched_scheduler_two_clients_attr),
+	{}
+};
+
+static struct kunit_suite drm_sched_scheduler_two_clients1 = {
+	.name = "drm_sched_scheduler_two_clients_one_credit_tests",
+	.init = drm_sched_scheduler_init,
+	.exit = drm_sched_scheduler_exit,
+	.test_cases = drm_sched_scheduler_two_clients_tests,
+};
+
+static struct kunit_suite drm_sched_scheduler_two_clients2 = {
+	.name = "drm_sched_scheduler_two_clients_two_credits_tests",
+	.init = drm_sched_scheduler_init2,
+	.exit = drm_sched_scheduler_exit,
+	.test_cases = drm_sched_scheduler_two_clients_tests,
+};
+
+kunit_test_suites(&drm_sched_scheduler_overhead,
+		  &drm_sched_scheduler_two_clients1,
+		  &drm_sched_scheduler_two_clients2);
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 02/15] drm/sched: Add some more scheduling quality unit tests
  2025-07-08  9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
  2025-07-08  9:51 ` [PATCH v6 01/15] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
@ 2025-07-08  9:51 ` Tvrtko Ursulin
  2025-07-08  9:51 ` [PATCH v6 03/15] drm/sched: Avoid double re-lock on the job free path Tvrtko Ursulin
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08  9:51 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Tvrtko Ursulin,
	Christian König, Danilo Krummrich, Matthew Brost,
	Philipp Stanner, Pierre-Eric Pelloux-Prayer

This time round we explore the rate of submitted job queue processing
with multiple identical parallel clients.

Example test output:

3 clients:
        t               cycle:     min  avg max : ...
        +     0ms                   0    0    0 :   0   0   0
        +   102ms                   2    2    2 :   2   2   2
        +   208ms                   5    6    6 :   6   5   5
        +   310ms                   8    9    9 :   9   9   8
...
        +  2616ms                  82   83   83 :  83  83  82
        +  2717ms                  83   83   83 :  83  83  83
    avg_max_min_delta(x100)=60

Every 100ms for the duration of the test test logs how many jobs each
client had completed, prefixed by minimum, average and maximum numbers.
When finished overall average delta between max and min is output as a
rough indicator to scheduling fairness.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
---
 .../gpu/drm/scheduler/tests/tests_scheduler.c | 186 +++++++++++++++++-
 1 file changed, 185 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/tests/tests_scheduler.c b/drivers/gpu/drm/scheduler/tests/tests_scheduler.c
index b66321ef7abe..d70b47d7bf7a 100644
--- a/drivers/gpu/drm/scheduler/tests/tests_scheduler.c
+++ b/drivers/gpu/drm/scheduler/tests/tests_scheduler.c
@@ -181,6 +181,7 @@ struct drm_sched_client_params {
 
 struct drm_sched_test_params {
 	const char *description;
+	unsigned int num_clients;
 	struct drm_sched_client_params client[2];
 };
 
@@ -626,6 +627,189 @@ static struct kunit_suite drm_sched_scheduler_two_clients2 = {
 	.test_cases = drm_sched_scheduler_two_clients_tests,
 };
 
+
+static const struct drm_sched_test_params drm_sched_many_cases[] = {
+	{
+		.description = "2 clients",
+		.num_clients = 2,
+		.client[0] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 4,
+			.job_us = 1000,
+			.wait_us = 0,
+			.sync = true,
+		},
+	},
+	{
+		.description = "3 clients",
+		.num_clients = 3,
+		.client[0] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 4,
+			.job_us = 1000,
+			.wait_us = 0,
+			.sync = true,
+		},
+	},
+	{
+		.description = "7 clients",
+		.num_clients = 7,
+		.client[0] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 4,
+			.job_us = 1000,
+			.wait_us = 0,
+			.sync = true,
+		},
+	},
+	{
+		.description = "13 clients",
+		.num_clients = 13,
+		.client[0] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 4,
+			.job_us = 1000,
+			.wait_us = 0,
+			.sync = true,
+		},
+	},
+	{
+		.description = "31 clients",
+		.num_clients = 31,
+		.client[0] = {
+			.priority = DRM_SCHED_PRIORITY_NORMAL,
+			.job_cnt = 2,
+			.job_us = 1000,
+			.wait_us = 0,
+			.sync = true,
+		},
+	},
+};
+
+KUNIT_ARRAY_PARAM(drm_sched_scheduler_many_clients,
+		  drm_sched_many_cases,
+		  drm_sched_desc);
+
+static void drm_sched_scheduler_many_clients_test(struct kunit *test)
+{
+	const struct drm_sched_test_params *params = test->param_value;
+	struct drm_mock_scheduler *sched = test->priv;
+	const unsigned int clients = params->num_clients;
+	unsigned int i, j, delta_total = 0, loops = 0;
+	struct test_client *client;
+	unsigned int *prev_cycle;
+	ktime_t start;
+	char *buf;
+
+	/*
+	 * Many clients with deep-ish async queues.
+	 */
+
+	buf = kunit_kmalloc(test, PAGE_SIZE, GFP_KERNEL);
+	client = kunit_kcalloc(test, clients, sizeof(*client), GFP_KERNEL);
+	prev_cycle = kunit_kcalloc(test, clients, sizeof(*prev_cycle),
+				   GFP_KERNEL);
+
+	for (i = 0; i < clients; i++)
+		client[i].entity =
+			drm_mock_sched_entity_new(test,
+						  DRM_SCHED_PRIORITY_NORMAL,
+						  sched);
+
+	for (i = 0; i < clients; i++) {
+		client[i].test = test;
+		client[i].id = i;
+		client[i].params = params->client[0];
+		client[i].duration = ms_to_ktime(1000 / clients);
+		client[i].cycle_time.min_us = ~0UL;
+		client[i].latency_time.min_us = ~0UL;
+		client[i].worker =
+			kthread_create_worker(0, "%s-%u", __func__, i);
+		if (IS_ERR(client[i].worker)) {
+			for (j = 0; j < i; j++)
+				kthread_destroy_worker(client[j].worker);
+			KUNIT_FAIL(test, "Failed to create worker!\n");
+		}
+
+		kthread_init_work(&client[i].work, drm_sched_client_work);
+	}
+
+	for (i = 0; i < clients; i++)
+		kthread_queue_work(client[i].worker, &client[i].work);
+
+	start = ktime_get();
+	pr_info("%u clients:\n\tt\t\tcycle:\t  min    avg    max : ...\n", clients);
+	for (;;) {
+		unsigned int min = ~0;
+		unsigned int max = 0;
+		unsigned int total = 0;
+		bool done = true;
+		char pbuf[16];
+
+		memset(buf, 0, PAGE_SIZE);
+		for (i = 0; i < clients; i++) {
+			unsigned int cycle, cycles;
+
+			cycle = READ_ONCE(client[i].cycle);
+			cycles = READ_ONCE(client[i].cycles);
+
+			snprintf(pbuf, sizeof(pbuf), " %3d", cycle);
+			strncat(buf, pbuf, PAGE_SIZE);
+
+			total += cycle;
+			if (cycle < min)
+				min = cycle;
+			if (cycle > max)
+				max = cycle;
+
+			if (!min || (cycle + 1) < cycles)
+				done = false;
+		}
+
+		loops++;
+		delta_total += max - min;
+
+		pr_info("\t+%6lldms\t\t  %3u  %3u  %3u :%s\n",
+			ktime_to_ms(ktime_sub(ktime_get(), start)),
+			min, DIV_ROUND_UP(total, clients), max, buf);
+
+		if (done)
+			break;
+
+		msleep(100);
+	}
+
+	pr_info("    avg_max_min_delta(x100)=%u\n",
+		loops ? DIV_ROUND_UP(delta_total * 100, loops) : 0);
+
+	for (i = 0; i < clients; i++) {
+		kthread_flush_work(&client[i].work);
+		kthread_destroy_worker(client[i].worker);
+	}
+
+	for (i = 0; i < clients; i++)
+		drm_mock_sched_entity_free(client[i].entity);
+}
+
+static const struct kunit_attributes drm_sched_scheduler_many_clients_attr = {
+	.speed = KUNIT_SPEED_SLOW,
+};
+
+static struct kunit_case drm_sched_scheduler_many_clients_tests[] = {
+	KUNIT_CASE_PARAM_ATTR(drm_sched_scheduler_many_clients_test,
+			      drm_sched_scheduler_many_clients_gen_params,
+			      drm_sched_scheduler_many_clients_attr),
+	{}
+};
+
+static struct kunit_suite drm_sched_scheduler_many_clients = {
+	.name = "drm_sched_scheduler_many_clients_tests",
+	.init = drm_sched_scheduler_init2,
+	.exit = drm_sched_scheduler_exit,
+	.test_cases = drm_sched_scheduler_many_clients_tests,
+};
+
 kunit_test_suites(&drm_sched_scheduler_overhead,
 		  &drm_sched_scheduler_two_clients1,
-		  &drm_sched_scheduler_two_clients2);
+		  &drm_sched_scheduler_two_clients2,
+		  &drm_sched_scheduler_many_clients);
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 03/15] drm/sched: Avoid double re-lock on the job free path
  2025-07-08  9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
  2025-07-08  9:51 ` [PATCH v6 01/15] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
  2025-07-08  9:51 ` [PATCH v6 02/15] drm/sched: Add some more " Tvrtko Ursulin
@ 2025-07-08  9:51 ` Tvrtko Ursulin
  2025-07-08 11:22   ` Philipp Stanner
  2025-07-08  9:51 ` [PATCH v6 04/15] drm/sched: Consolidate drm_sched_job_timedout Tvrtko Ursulin
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08  9:51 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Tvrtko Ursulin,
	Christian König, Danilo Krummrich, Matthew Brost,
	Philipp Stanner

Currently the job free work item will lock sched->job_list_lock first time
to see if there are any jobs, free a single job, and then lock again to
decide whether to re-queue itself if there are more finished jobs.

Since drm_sched_get_finished_job() already looks at the second job in the
queue we can simply add the signaled check and have it return the presence
of more jobs to free to the caller. That way the work item does not have
to lock the list again and repeat the signaled check.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/sched_main.c | 37 ++++++++++----------------
 1 file changed, 14 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 1f077782ec12..1bce0b66f89c 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -366,22 +366,6 @@ static void __drm_sched_run_free_queue(struct drm_gpu_scheduler *sched)
 		queue_work(sched->submit_wq, &sched->work_free_job);
 }
 
-/**
- * drm_sched_run_free_queue - enqueue free-job work if ready
- * @sched: scheduler instance
- */
-static void drm_sched_run_free_queue(struct drm_gpu_scheduler *sched)
-{
-	struct drm_sched_job *job;
-
-	spin_lock(&sched->job_list_lock);
-	job = list_first_entry_or_null(&sched->pending_list,
-				       struct drm_sched_job, list);
-	if (job && dma_fence_is_signaled(&job->s_fence->finished))
-		__drm_sched_run_free_queue(sched);
-	spin_unlock(&sched->job_list_lock);
-}
-
 /**
  * drm_sched_job_done - complete a job
  * @s_job: pointer to the job which is done
@@ -1102,12 +1086,13 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
  * drm_sched_get_finished_job - fetch the next finished job to be destroyed
  *
  * @sched: scheduler instance
+ * @have_more: are there more finished jobs on the list
  *
  * Returns the next finished job from the pending list (if there is one)
  * ready for it to be destroyed.
  */
 static struct drm_sched_job *
-drm_sched_get_finished_job(struct drm_gpu_scheduler *sched)
+drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool *have_more)
 {
 	struct drm_sched_job *job, *next;
 
@@ -1115,22 +1100,25 @@ drm_sched_get_finished_job(struct drm_gpu_scheduler *sched)
 
 	job = list_first_entry_or_null(&sched->pending_list,
 				       struct drm_sched_job, list);
-
 	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
 		/* remove job from pending_list */
 		list_del_init(&job->list);
 
 		/* cancel this job's TO timer */
 		cancel_delayed_work(&sched->work_tdr);
-		/* make the scheduled timestamp more accurate */
+
+		*have_more = false;
 		next = list_first_entry_or_null(&sched->pending_list,
 						typeof(*next), list);
-
 		if (next) {
+			/* make the scheduled timestamp more accurate */
 			if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
 				     &next->s_fence->scheduled.flags))
 				next->s_fence->scheduled.timestamp =
 					dma_fence_timestamp(&job->s_fence->finished);
+
+			*have_more = dma_fence_is_signaled(&next->s_fence->finished);
+
 			/* start TO timer for next job */
 			drm_sched_start_timeout(sched);
 		}
@@ -1189,12 +1177,15 @@ static void drm_sched_free_job_work(struct work_struct *w)
 	struct drm_gpu_scheduler *sched =
 		container_of(w, struct drm_gpu_scheduler, work_free_job);
 	struct drm_sched_job *job;
+	bool have_more;
 
-	job = drm_sched_get_finished_job(sched);
-	if (job)
+	job = drm_sched_get_finished_job(sched, &have_more);
+	if (job) {
 		sched->ops->free_job(job);
+		if (have_more)
+			__drm_sched_run_free_queue(sched);
+	}
 
-	drm_sched_run_free_queue(sched);
 	drm_sched_run_job_queue(sched);
 }
 
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 04/15] drm/sched: Consolidate drm_sched_job_timedout
  2025-07-08  9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2025-07-08  9:51 ` [PATCH v6 03/15] drm/sched: Avoid double re-lock on the job free path Tvrtko Ursulin
@ 2025-07-08  9:51 ` Tvrtko Ursulin
  2025-07-08  9:51 ` [PATCH v6 05/15] drm/sched: Consolidate drm_sched_rq_select_entity_rr Tvrtko Ursulin
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08  9:51 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Tvrtko Ursulin,
	Christian König, Danilo Krummrich, Matthew Brost,
	Philipp Stanner

Reduce to one spin_unlock for hopefully a little bit clearer flow in the
function. It may appear that there is a behavioural change with the
drm_sched_start_timeout_unlocked() now not being called if there were
initially no jobs on the pending list, and then some appeared after
unlock, however if the code would rely on the TDR handler restarting
itself then it would fail to do that if the job arrived on the pending
list after the check.

Also fix one stale comment while touching the function.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/sched_main.c | 33 +++++++++++++-------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 1bce0b66f89c..2335df4de2b8 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -517,17 +517,15 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
 
 static void drm_sched_job_timedout(struct work_struct *work)
 {
-	struct drm_gpu_scheduler *sched;
+	struct drm_gpu_scheduler *sched =
+		container_of(work, struct drm_gpu_scheduler, work_tdr.work);
+	enum drm_gpu_sched_stat status;
 	struct drm_sched_job *job;
-	enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL;
-
-	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
 
 	/* Protects against concurrent deletion in drm_sched_get_finished_job */
 	spin_lock(&sched->job_list_lock);
 	job = list_first_entry_or_null(&sched->pending_list,
 				       struct drm_sched_job, list);
-
 	if (job) {
 		/*
 		 * Remove the bad job so it cannot be freed by a concurrent
@@ -536,20 +534,21 @@ static void drm_sched_job_timedout(struct work_struct *work)
 		 * cancelled, at which point it's safe.
 		 */
 		list_del_init(&job->list);
-		spin_unlock(&sched->job_list_lock);
+	}
+	spin_unlock(&sched->job_list_lock);
 
-		status = job->sched->ops->timedout_job(job);
+	if (!job)
+		return;
 
-		/*
-		 * Guilty job did complete and hence needs to be manually removed
-		 * See drm_sched_stop doc.
-		 */
-		if (sched->free_guilty) {
-			job->sched->ops->free_job(job);
-			sched->free_guilty = false;
-		}
-	} else {
-		spin_unlock(&sched->job_list_lock);
+	status = job->sched->ops->timedout_job(job);
+
+	/*
+	 * Guilty job did complete and hence needs to be manually removed. See
+	 * documentation for drm_sched_stop.
+	 */
+	if (sched->free_guilty) {
+		job->sched->ops->free_job(job);
+		sched->free_guilty = false;
 	}
 
 	if (status != DRM_GPU_SCHED_STAT_ENODEV)
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 05/15] drm/sched: Consolidate drm_sched_rq_select_entity_rr
  2025-07-08  9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2025-07-08  9:51 ` [PATCH v6 04/15] drm/sched: Consolidate drm_sched_job_timedout Tvrtko Ursulin
@ 2025-07-08  9:51 ` Tvrtko Ursulin
  2025-07-08 11:31   ` Philipp Stanner
  2025-07-08  9:51 ` [PATCH v6 06/15] drm/sched: Implement RR via FIFO Tvrtko Ursulin
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08  9:51 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Tvrtko Ursulin,
	Christian König, Danilo Krummrich, Matthew Brost,
	Philipp Stanner

Extract out two copies of the identical code to function epilogue to make
it smaller and more readable.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/sched_main.c | 48 +++++++++++---------------
 1 file changed, 20 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 2335df4de2b8..9212ebae7476 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -263,38 +263,14 @@ drm_sched_rq_select_entity_rr(struct drm_gpu_scheduler *sched,
 	entity = rq->current_entity;
 	if (entity) {
 		list_for_each_entry_continue(entity, &rq->entities, list) {
-			if (drm_sched_entity_is_ready(entity)) {
-				/* If we can't queue yet, preserve the current
-				 * entity in terms of fairness.
-				 */
-				if (!drm_sched_can_queue(sched, entity)) {
-					spin_unlock(&rq->lock);
-					return ERR_PTR(-ENOSPC);
-				}
-
-				rq->current_entity = entity;
-				reinit_completion(&entity->entity_idle);
-				spin_unlock(&rq->lock);
-				return entity;
-			}
+			if (drm_sched_entity_is_ready(entity))
+				goto found;
 		}
 	}
 
 	list_for_each_entry(entity, &rq->entities, list) {
-		if (drm_sched_entity_is_ready(entity)) {
-			/* If we can't queue yet, preserve the current entity in
-			 * terms of fairness.
-			 */
-			if (!drm_sched_can_queue(sched, entity)) {
-				spin_unlock(&rq->lock);
-				return ERR_PTR(-ENOSPC);
-			}
-
-			rq->current_entity = entity;
-			reinit_completion(&entity->entity_idle);
-			spin_unlock(&rq->lock);
-			return entity;
-		}
+		if (drm_sched_entity_is_ready(entity))
+			goto found;
 
 		if (entity == rq->current_entity)
 			break;
@@ -303,6 +279,22 @@ drm_sched_rq_select_entity_rr(struct drm_gpu_scheduler *sched,
 	spin_unlock(&rq->lock);
 
 	return NULL;
+
+found:
+	if (!drm_sched_can_queue(sched, entity)) {
+		/*
+		 * If scheduler cannot take more jobs signal the caller to not
+		 * consider lower priority queues.
+		 */
+		entity = ERR_PTR(-ENOSPC);
+	} else {
+		rq->current_entity = entity;
+		reinit_completion(&entity->entity_idle);
+	}
+
+	spin_unlock(&rq->lock);
+
+	return entity;
 }
 
 /**
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 06/15] drm/sched: Implement RR via FIFO
  2025-07-08  9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  2025-07-08  9:51 ` [PATCH v6 05/15] drm/sched: Consolidate drm_sched_rq_select_entity_rr Tvrtko Ursulin
@ 2025-07-08  9:51 ` Tvrtko Ursulin
  2025-07-08  9:51 ` [PATCH v6 07/15] drm/sched: Consolidate entity run queue management Tvrtko Ursulin
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08  9:51 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Tvrtko Ursulin,
	Christian König, Danilo Krummrich, Matthew Brost,
	Philipp Stanner

Round-robin being the non-default policy and unclear how much it is used,
we can notice that it can be implemented using the FIFO data structures if
we only invent a fake submit timestamp which is monotonically increasing
inside drm_sched_rq instances.

So instead of remembering which was the last entity the scheduler worker
picked, we can bump the picked one to the bottom of the tree, achieving
the same round-robin behaviour.

Advantage is that we can consolidate to a single code path and remove a
bunch of code. Downside is round-robin mode now needs to lock on the job
pop path but that should not be visible.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/sched_entity.c | 46 ++++++++------
 drivers/gpu/drm/scheduler/sched_main.c   | 76 ++----------------------
 include/drm/gpu_scheduler.h              |  6 +-
 3 files changed, 37 insertions(+), 91 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 0bdf52e27461..04350097f85f 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -470,9 +470,19 @@ drm_sched_job_dependency(struct drm_sched_job *job,
 	return NULL;
 }
 
+static ktime_t
+drm_sched_rq_get_rr_deadline(struct drm_sched_rq *rq)
+{
+	lockdep_assert_held(&rq->lock);
+
+	rq->rr_deadline = ktime_add_ns(rq->rr_deadline, 1);
+
+	return rq->rr_deadline;
+}
+
 struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
 {
-	struct drm_sched_job *sched_job;
+	struct drm_sched_job *sched_job, *next_job;
 
 	sched_job = drm_sched_entity_queue_peek(entity);
 	if (!sched_job)
@@ -507,21 +517,21 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
 	 * Update the entity's location in the min heap according to
 	 * the timestamp of the next job, if any.
 	 */
-	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) {
-		struct drm_sched_job *next;
+	next_job = drm_sched_entity_queue_peek(entity);
+	if (next_job) {
+		struct drm_sched_rq *rq;
+		ktime_t ts;
 
-		next = drm_sched_entity_queue_peek(entity);
-		if (next) {
-			struct drm_sched_rq *rq;
-
-			spin_lock(&entity->lock);
-			rq = entity->rq;
-			spin_lock(&rq->lock);
-			drm_sched_rq_update_fifo_locked(entity, rq,
-							next->submit_ts);
-			spin_unlock(&rq->lock);
-			spin_unlock(&entity->lock);
-		}
+		spin_lock(&entity->lock);
+		rq = entity->rq;
+		spin_lock(&rq->lock);
+		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+			ts = next_job->submit_ts;
+		else
+			ts = drm_sched_rq_get_rr_deadline(rq);
+		drm_sched_rq_update_fifo_locked(entity, rq, ts);
+		spin_unlock(&rq->lock);
+		spin_unlock(&entity->lock);
 	}
 
 	/* Jobs and entities might have different lifecycles. Since we're
@@ -628,9 +638,9 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 
 		spin_lock(&rq->lock);
 		drm_sched_rq_add_entity(rq, entity);
-
-		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-			drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
+		if (drm_sched_policy == DRM_SCHED_POLICY_RR)
+			submit_ts = drm_sched_rq_get_rr_deadline(rq);
+		drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
 
 		spin_unlock(&rq->lock);
 		spin_unlock(&entity->lock);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 9212ebae7476..5c6a3a9dfe5b 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -185,7 +185,6 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
 	spin_lock_init(&rq->lock);
 	INIT_LIST_HEAD(&rq->entities);
 	rq->rb_tree_root = RB_ROOT_CACHED;
-	rq->current_entity = NULL;
 	rq->sched = sched;
 }
 
@@ -231,74 +230,13 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
 	atomic_dec(rq->sched->score);
 	list_del_init(&entity->list);
 
-	if (rq->current_entity == entity)
-		rq->current_entity = NULL;
-
-	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-		drm_sched_rq_remove_fifo_locked(entity, rq);
+	drm_sched_rq_remove_fifo_locked(entity, rq);
 
 	spin_unlock(&rq->lock);
 }
 
 /**
- * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run
- *
- * @sched: the gpu scheduler
- * @rq: scheduler run queue to check.
- *
- * Try to find the next ready entity.
- *
- * Return an entity if one is found; return an error-pointer (!NULL) if an
- * entity was ready, but the scheduler had insufficient credits to accommodate
- * its job; return NULL, if no ready entity was found.
- */
-static struct drm_sched_entity *
-drm_sched_rq_select_entity_rr(struct drm_gpu_scheduler *sched,
-			      struct drm_sched_rq *rq)
-{
-	struct drm_sched_entity *entity;
-
-	spin_lock(&rq->lock);
-
-	entity = rq->current_entity;
-	if (entity) {
-		list_for_each_entry_continue(entity, &rq->entities, list) {
-			if (drm_sched_entity_is_ready(entity))
-				goto found;
-		}
-	}
-
-	list_for_each_entry(entity, &rq->entities, list) {
-		if (drm_sched_entity_is_ready(entity))
-			goto found;
-
-		if (entity == rq->current_entity)
-			break;
-	}
-
-	spin_unlock(&rq->lock);
-
-	return NULL;
-
-found:
-	if (!drm_sched_can_queue(sched, entity)) {
-		/*
-		 * If scheduler cannot take more jobs signal the caller to not
-		 * consider lower priority queues.
-		 */
-		entity = ERR_PTR(-ENOSPC);
-	} else {
-		rq->current_entity = entity;
-		reinit_completion(&entity->entity_idle);
-	}
-
-	spin_unlock(&rq->lock);
-
-	return entity;
-}
-
-/**
- * drm_sched_rq_select_entity_fifo - Select an entity which provides a job to run
+ * drm_sched_rq_select_entity - Select an entity which provides a job to run
  *
  * @sched: the gpu scheduler
  * @rq: scheduler run queue to check.
@@ -310,8 +248,8 @@ drm_sched_rq_select_entity_rr(struct drm_gpu_scheduler *sched,
  * its job; return NULL, if no ready entity was found.
  */
 static struct drm_sched_entity *
-drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched,
-				struct drm_sched_rq *rq)
+drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
+			   struct drm_sched_rq *rq)
 {
 	struct rb_node *rb;
 
@@ -1057,15 +995,13 @@ void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
 static struct drm_sched_entity *
 drm_sched_select_entity(struct drm_gpu_scheduler *sched)
 {
-	struct drm_sched_entity *entity;
+	struct drm_sched_entity *entity = NULL;
 	int i;
 
 	/* Start with the highest priority.
 	 */
 	for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
-		entity = drm_sched_policy == DRM_SCHED_POLICY_FIFO ?
-			drm_sched_rq_select_entity_fifo(sched, sched->sched_rq[i]) :
-			drm_sched_rq_select_entity_rr(sched, sched->sched_rq[i]);
+		entity = drm_sched_rq_select_entity(sched, sched->sched_rq[i]);
 		if (entity)
 			break;
 	}
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index e62a7214e052..74356aa5b6fb 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -239,8 +239,8 @@ struct drm_sched_entity {
  * struct drm_sched_rq - queue of entities to be scheduled.
  *
  * @sched: the scheduler to which this rq belongs to.
- * @lock: protects @entities, @rb_tree_root and @current_entity.
- * @current_entity: the entity which is to be scheduled.
+ * @lock: protects @entities, @rb_tree_root and @rr_deadline.
+ * @rr_deadline: monotonically incrementing fake timestamp for RR mode
  * @entities: list of the entities to be scheduled.
  * @rb_tree_root: root of time based priority queue of entities for FIFO scheduling
  *
@@ -253,7 +253,7 @@ struct drm_sched_rq {
 
 	spinlock_t			lock;
 	/* Following members are protected by the @lock: */
-	struct drm_sched_entity		*current_entity;
+	ktime_t				rr_deadline;
 	struct list_head		entities;
 	struct rb_root_cached		rb_tree_root;
 };
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 07/15] drm/sched: Consolidate entity run queue management
  2025-07-08  9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
                   ` (5 preceding siblings ...)
  2025-07-08  9:51 ` [PATCH v6 06/15] drm/sched: Implement RR via FIFO Tvrtko Ursulin
@ 2025-07-08  9:51 ` Tvrtko Ursulin
  2025-07-08  9:51 ` [PATCH v6 08/15] drm/sched: Move run queue related code into a separate file Tvrtko Ursulin
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08  9:51 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Tvrtko Ursulin,
	Christian König, Danilo Krummrich, Matthew Brost,
	Philipp Stanner

Move the code dealing with entities entering and exiting run queues to
helpers to logically separate it from jobs entering and exiting entities.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/sched_entity.c   | 59 ++------------
 drivers/gpu/drm/scheduler/sched_internal.h |  8 +-
 drivers/gpu/drm/scheduler/sched_main.c     | 90 +++++++++++++++++++---
 3 files changed, 86 insertions(+), 71 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 04350097f85f..e42526aa22dc 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -470,19 +470,9 @@ drm_sched_job_dependency(struct drm_sched_job *job,
 	return NULL;
 }
 
-static ktime_t
-drm_sched_rq_get_rr_deadline(struct drm_sched_rq *rq)
-{
-	lockdep_assert_held(&rq->lock);
-
-	rq->rr_deadline = ktime_add_ns(rq->rr_deadline, 1);
-
-	return rq->rr_deadline;
-}
-
 struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
 {
-	struct drm_sched_job *sched_job, *next_job;
+	struct drm_sched_job *sched_job;
 
 	sched_job = drm_sched_entity_queue_peek(entity);
 	if (!sched_job)
@@ -513,26 +503,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
 
 	spsc_queue_pop(&entity->job_queue);
 
-	/*
-	 * Update the entity's location in the min heap according to
-	 * the timestamp of the next job, if any.
-	 */
-	next_job = drm_sched_entity_queue_peek(entity);
-	if (next_job) {
-		struct drm_sched_rq *rq;
-		ktime_t ts;
-
-		spin_lock(&entity->lock);
-		rq = entity->rq;
-		spin_lock(&rq->lock);
-		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-			ts = next_job->submit_ts;
-		else
-			ts = drm_sched_rq_get_rr_deadline(rq);
-		drm_sched_rq_update_fifo_locked(entity, rq, ts);
-		spin_unlock(&rq->lock);
-		spin_unlock(&entity->lock);
-	}
+	drm_sched_rq_pop_entity(entity);
 
 	/* Jobs and entities might have different lifecycles. Since we're
 	 * removing the job from the entities queue, set the jobs entity pointer
@@ -622,30 +593,10 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 	/* first job wakes up scheduler */
 	if (first) {
 		struct drm_gpu_scheduler *sched;
-		struct drm_sched_rq *rq;
 
-		/* Add the entity to the run queue */
-		spin_lock(&entity->lock);
-		if (entity->stopped) {
-			spin_unlock(&entity->lock);
-
-			DRM_ERROR("Trying to push to a killed entity\n");
-			return;
-		}
-
-		rq = entity->rq;
-		sched = rq->sched;
-
-		spin_lock(&rq->lock);
-		drm_sched_rq_add_entity(rq, entity);
-		if (drm_sched_policy == DRM_SCHED_POLICY_RR)
-			submit_ts = drm_sched_rq_get_rr_deadline(rq);
-		drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
-
-		spin_unlock(&rq->lock);
-		spin_unlock(&entity->lock);
-
-		drm_sched_wakeup(sched);
+		sched = drm_sched_rq_add_entity(entity, submit_ts);
+		if (sched)
+			drm_sched_wakeup(sched);
 	}
 }
 EXPORT_SYMBOL(drm_sched_entity_push_job);
diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
index 7ea5a6736f98..8269c5392a82 100644
--- a/drivers/gpu/drm/scheduler/sched_internal.h
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -12,13 +12,11 @@ extern int drm_sched_policy;
 
 void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
 
-void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
-			     struct drm_sched_entity *entity);
+struct drm_gpu_scheduler *
+drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts);
 void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
 				struct drm_sched_entity *entity);
-
-void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
-				     struct drm_sched_rq *rq, ktime_t ts);
+void drm_sched_rq_pop_entity(struct drm_sched_entity *entity);
 
 void drm_sched_entity_select_rq(struct drm_sched_entity *entity);
 struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 5c6a3a9dfe5b..796f0b218cf7 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -151,9 +151,9 @@ static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
 	}
 }
 
-void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
-				     struct drm_sched_rq *rq,
-				     ktime_t ts)
+static void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+					    struct drm_sched_rq *rq,
+					    ktime_t ts)
 {
 	/*
 	 * Both locks need to be grabbed, one to protect from entity->rq change
@@ -188,25 +188,58 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
 	rq->sched = sched;
 }
 
+static ktime_t
+drm_sched_rq_get_rr_deadline(struct drm_sched_rq *rq)
+{
+	lockdep_assert_held(&rq->lock);
+
+	rq->rr_deadline = ktime_add_ns(rq->rr_deadline, 1);
+
+	return rq->rr_deadline;
+}
+
 /**
  * drm_sched_rq_add_entity - add an entity
  *
- * @rq: scheduler run queue
  * @entity: scheduler entity
+ * @ts: submission timestamp
  *
  * Adds a scheduler entity to the run queue.
+ *
+ * Returns a DRM scheduler pre-selected to handle this entity.
  */
-void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
-			     struct drm_sched_entity *entity)
+struct drm_gpu_scheduler *
+drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts)
 {
-	lockdep_assert_held(&entity->lock);
-	lockdep_assert_held(&rq->lock);
+	struct drm_gpu_scheduler *sched;
+	struct drm_sched_rq *rq;
 
-	if (!list_empty(&entity->list))
-		return;
+	/* Add the entity to the run queue */
+	spin_lock(&entity->lock);
+	if (entity->stopped) {
+		spin_unlock(&entity->lock);
 
-	atomic_inc(rq->sched->score);
-	list_add_tail(&entity->list, &rq->entities);
+		DRM_ERROR("Trying to push to a killed entity\n");
+		return NULL;
+	}
+
+	rq = entity->rq;
+	spin_lock(&rq->lock);
+	sched = rq->sched;
+
+	if (list_empty(&entity->list)) {
+		atomic_inc(sched->score);
+		list_add_tail(&entity->list, &rq->entities);
+	}
+
+	if (drm_sched_policy == DRM_SCHED_POLICY_RR)
+		ts = drm_sched_rq_get_rr_deadline(rq);
+	drm_sched_rq_update_fifo_locked(entity, rq, ts);
+
+	spin_unlock(&rq->lock);
+	spin_unlock(&entity->lock);
+
+	return sched;
 }
 
 /**
@@ -235,6 +268,39 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
 	spin_unlock(&rq->lock);
 }
 
+/**
+ * drm_sched_rq_pop_entity - pops an entity
+ *
+ * @entity: scheduler entity
+ *
+ * To be called every time after a job is popped from the entity.
+ */
+void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
+{
+	struct drm_sched_job *next_job;
+	struct drm_sched_rq *rq;
+	ktime_t ts;
+
+	/*
+	 * Update the entity's location in the min heap according to
+	 * the timestamp of the next job, if any.
+	 */
+	next_job = drm_sched_entity_queue_peek(entity);
+	if (!next_job)
+		return;
+
+	spin_lock(&entity->lock);
+	rq = entity->rq;
+	spin_lock(&rq->lock);
+	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+		ts = next_job->submit_ts;
+	else
+		ts = drm_sched_rq_get_rr_deadline(rq);
+	drm_sched_rq_update_fifo_locked(entity, rq, ts);
+	spin_unlock(&rq->lock);
+	spin_unlock(&entity->lock);
+}
+
 /**
  * drm_sched_rq_select_entity - Select an entity which provides a job to run
  *
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 08/15] drm/sched: Move run queue related code into a separate file
  2025-07-08  9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
                   ` (6 preceding siblings ...)
  2025-07-08  9:51 ` [PATCH v6 07/15] drm/sched: Consolidate entity run queue management Tvrtko Ursulin
@ 2025-07-08  9:51 ` Tvrtko Ursulin
  2025-07-08  9:51 ` [PATCH v6 09/15] drm/sched: Free all finished jobs at once Tvrtko Ursulin
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08  9:51 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Tvrtko Ursulin,
	Christian König, Danilo Krummrich, Matthew Brost,
	Philipp Stanner

Lets move all the code dealing with struct drm_sched_rq into a separate
compilation unit. Advantage being sched_main.c is left with a clearer set
of responsibilities.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/Makefile         |   2 +-
 drivers/gpu/drm/scheduler/sched_internal.h |   7 +
 drivers/gpu/drm/scheduler/sched_main.c     | 213 +-------------------
 drivers/gpu/drm/scheduler/sched_rq.c       | 217 +++++++++++++++++++++
 4 files changed, 227 insertions(+), 212 deletions(-)
 create mode 100644 drivers/gpu/drm/scheduler/sched_rq.c

diff --git a/drivers/gpu/drm/scheduler/Makefile b/drivers/gpu/drm/scheduler/Makefile
index 6e13e4c63e9d..74e75eff6df5 100644
--- a/drivers/gpu/drm/scheduler/Makefile
+++ b/drivers/gpu/drm/scheduler/Makefile
@@ -20,7 +20,7 @@
 # OTHER DEALINGS IN THE SOFTWARE.
 #
 #
-gpu-sched-y := sched_main.o sched_fence.o sched_entity.o
+gpu-sched-y := sched_main.o sched_fence.o sched_entity.o sched_rq.o
 
 obj-$(CONFIG_DRM_SCHED) += gpu-sched.o
 
diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
index 8269c5392a82..703ee48fbc58 100644
--- a/drivers/gpu/drm/scheduler/sched_internal.h
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -10,8 +10,15 @@ extern int drm_sched_policy;
 #define DRM_SCHED_POLICY_RR    0
 #define DRM_SCHED_POLICY_FIFO  1
 
+bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
+			 struct drm_sched_entity *entity);
 void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
 
+void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
+		       struct drm_sched_rq *rq);
+struct drm_sched_entity *
+drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
+			   struct drm_sched_rq *rq);
 struct drm_gpu_scheduler *
 drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts);
 void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 796f0b218cf7..c906c2f3e6d5 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -112,8 +112,8 @@ static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
  * Return true if we can push at least one more job from @entity, false
  * otherwise.
  */
-static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
-				struct drm_sched_entity *entity)
+bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
+			 struct drm_sched_entity *entity)
 {
 	struct drm_sched_job *s_job;
 
@@ -133,215 +133,6 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
 	return drm_sched_available_credits(sched) >= s_job->credits;
 }
 
-static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a,
-							    const struct rb_node *b)
-{
-	struct drm_sched_entity *ent_a =  rb_entry((a), struct drm_sched_entity, rb_tree_node);
-	struct drm_sched_entity *ent_b =  rb_entry((b), struct drm_sched_entity, rb_tree_node);
-
-	return ktime_before(ent_a->oldest_job_waiting, ent_b->oldest_job_waiting);
-}
-
-static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
-					    struct drm_sched_rq *rq)
-{
-	if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
-		rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
-		RB_CLEAR_NODE(&entity->rb_tree_node);
-	}
-}
-
-static void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
-					    struct drm_sched_rq *rq,
-					    ktime_t ts)
-{
-	/*
-	 * Both locks need to be grabbed, one to protect from entity->rq change
-	 * for entity from within concurrent drm_sched_entity_select_rq and the
-	 * other to update the rb tree structure.
-	 */
-	lockdep_assert_held(&entity->lock);
-	lockdep_assert_held(&rq->lock);
-
-	drm_sched_rq_remove_fifo_locked(entity, rq);
-
-	entity->oldest_job_waiting = ts;
-
-	rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
-		      drm_sched_entity_compare_before);
-}
-
-/**
- * drm_sched_rq_init - initialize a given run queue struct
- *
- * @sched: scheduler instance to associate with this run queue
- * @rq: scheduler run queue
- *
- * Initializes a scheduler runqueue.
- */
-static void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
-			      struct drm_sched_rq *rq)
-{
-	spin_lock_init(&rq->lock);
-	INIT_LIST_HEAD(&rq->entities);
-	rq->rb_tree_root = RB_ROOT_CACHED;
-	rq->sched = sched;
-}
-
-static ktime_t
-drm_sched_rq_get_rr_deadline(struct drm_sched_rq *rq)
-{
-	lockdep_assert_held(&rq->lock);
-
-	rq->rr_deadline = ktime_add_ns(rq->rr_deadline, 1);
-
-	return rq->rr_deadline;
-}
-
-/**
- * drm_sched_rq_add_entity - add an entity
- *
- * @entity: scheduler entity
- * @ts: submission timestamp
- *
- * Adds a scheduler entity to the run queue.
- *
- * Returns a DRM scheduler pre-selected to handle this entity.
- */
-struct drm_gpu_scheduler *
-drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts)
-{
-	struct drm_gpu_scheduler *sched;
-	struct drm_sched_rq *rq;
-
-	/* Add the entity to the run queue */
-	spin_lock(&entity->lock);
-	if (entity->stopped) {
-		spin_unlock(&entity->lock);
-
-		DRM_ERROR("Trying to push to a killed entity\n");
-		return NULL;
-	}
-
-	rq = entity->rq;
-	spin_lock(&rq->lock);
-	sched = rq->sched;
-
-	if (list_empty(&entity->list)) {
-		atomic_inc(sched->score);
-		list_add_tail(&entity->list, &rq->entities);
-	}
-
-	if (drm_sched_policy == DRM_SCHED_POLICY_RR)
-		ts = drm_sched_rq_get_rr_deadline(rq);
-	drm_sched_rq_update_fifo_locked(entity, rq, ts);
-
-	spin_unlock(&rq->lock);
-	spin_unlock(&entity->lock);
-
-	return sched;
-}
-
-/**
- * drm_sched_rq_remove_entity - remove an entity
- *
- * @rq: scheduler run queue
- * @entity: scheduler entity
- *
- * Removes a scheduler entity from the run queue.
- */
-void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
-				struct drm_sched_entity *entity)
-{
-	lockdep_assert_held(&entity->lock);
-
-	if (list_empty(&entity->list))
-		return;
-
-	spin_lock(&rq->lock);
-
-	atomic_dec(rq->sched->score);
-	list_del_init(&entity->list);
-
-	drm_sched_rq_remove_fifo_locked(entity, rq);
-
-	spin_unlock(&rq->lock);
-}
-
-/**
- * drm_sched_rq_pop_entity - pops an entity
- *
- * @entity: scheduler entity
- *
- * To be called every time after a job is popped from the entity.
- */
-void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
-{
-	struct drm_sched_job *next_job;
-	struct drm_sched_rq *rq;
-	ktime_t ts;
-
-	/*
-	 * Update the entity's location in the min heap according to
-	 * the timestamp of the next job, if any.
-	 */
-	next_job = drm_sched_entity_queue_peek(entity);
-	if (!next_job)
-		return;
-
-	spin_lock(&entity->lock);
-	rq = entity->rq;
-	spin_lock(&rq->lock);
-	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-		ts = next_job->submit_ts;
-	else
-		ts = drm_sched_rq_get_rr_deadline(rq);
-	drm_sched_rq_update_fifo_locked(entity, rq, ts);
-	spin_unlock(&rq->lock);
-	spin_unlock(&entity->lock);
-}
-
-/**
- * drm_sched_rq_select_entity - Select an entity which provides a job to run
- *
- * @sched: the gpu scheduler
- * @rq: scheduler run queue to check.
- *
- * Find oldest waiting ready entity.
- *
- * Return an entity if one is found; return an error-pointer (!NULL) if an
- * entity was ready, but the scheduler had insufficient credits to accommodate
- * its job; return NULL, if no ready entity was found.
- */
-static struct drm_sched_entity *
-drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
-			   struct drm_sched_rq *rq)
-{
-	struct rb_node *rb;
-
-	spin_lock(&rq->lock);
-	for (rb = rb_first_cached(&rq->rb_tree_root); rb; rb = rb_next(rb)) {
-		struct drm_sched_entity *entity;
-
-		entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
-		if (drm_sched_entity_is_ready(entity)) {
-			/* If we can't queue yet, preserve the current entity in
-			 * terms of fairness.
-			 */
-			if (!drm_sched_can_queue(sched, entity)) {
-				spin_unlock(&rq->lock);
-				return ERR_PTR(-ENOSPC);
-			}
-
-			reinit_completion(&entity->entity_idle);
-			break;
-		}
-	}
-	spin_unlock(&rq->lock);
-
-	return rb ? rb_entry(rb, struct drm_sched_entity, rb_tree_node) : NULL;
-}
-
 /**
  * drm_sched_run_job_queue - enqueue run-job work
  * @sched: scheduler instance
diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
new file mode 100644
index 000000000000..0492d1f21572
--- /dev/null
+++ b/drivers/gpu/drm/scheduler/sched_rq.c
@@ -0,0 +1,217 @@
+#include <linux/rbtree.h>
+
+#include <drm/drm_print.h>
+#include <drm/gpu_scheduler.h>
+
+#include "sched_internal.h"
+
+static __always_inline bool
+drm_sched_entity_compare_before(struct rb_node *a, const struct rb_node *b)
+{
+	struct drm_sched_entity *ea =
+		rb_entry((a), struct drm_sched_entity, rb_tree_node);
+	struct drm_sched_entity *eb =
+		rb_entry((b), struct drm_sched_entity, rb_tree_node);
+
+	return ktime_before(ea->oldest_job_waiting, eb->oldest_job_waiting);
+}
+
+static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
+					    struct drm_sched_rq *rq)
+{
+	if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
+		rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
+		RB_CLEAR_NODE(&entity->rb_tree_node);
+	}
+}
+
+static void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+					    struct drm_sched_rq *rq,
+					    ktime_t ts)
+{
+	/*
+	 * Both locks need to be grabbed, one to protect from entity->rq change
+	 * for entity from within concurrent drm_sched_entity_select_rq and the
+	 * other to update the rb tree structure.
+	 */
+	lockdep_assert_held(&entity->lock);
+	lockdep_assert_held(&rq->lock);
+
+	drm_sched_rq_remove_fifo_locked(entity, rq);
+
+	entity->oldest_job_waiting = ts;
+
+	rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
+		      drm_sched_entity_compare_before);
+}
+
+/**
+ * drm_sched_rq_init - initialize a given run queue struct
+ *
+ * @sched: scheduler instance to associate with this run queue
+ * @rq: scheduler run queue
+ *
+ * Initializes a scheduler runqueue.
+ */
+void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
+		       struct drm_sched_rq *rq)
+{
+	spin_lock_init(&rq->lock);
+	INIT_LIST_HEAD(&rq->entities);
+	rq->rb_tree_root = RB_ROOT_CACHED;
+	rq->sched = sched;
+}
+
+static ktime_t
+drm_sched_rq_get_rr_deadline(struct drm_sched_rq *rq)
+{
+	lockdep_assert_held(&rq->lock);
+
+	rq->rr_deadline = ktime_add_ns(rq->rr_deadline, 1);
+
+	return rq->rr_deadline;
+}
+
+/**
+ * drm_sched_rq_add_entity - add an entity
+ *
+ * @entity: scheduler entity
+ * @ts: submission timestamp
+ *
+ * Adds a scheduler entity to the run queue.
+ *
+ * Returns a DRM scheduler pre-selected to handle this entity.
+ */
+struct drm_gpu_scheduler *
+drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts)
+{
+	struct drm_gpu_scheduler *sched;
+	struct drm_sched_rq *rq;
+
+	/* Add the entity to the run queue */
+	spin_lock(&entity->lock);
+	if (entity->stopped) {
+		spin_unlock(&entity->lock);
+
+		DRM_ERROR("Trying to push to a killed entity\n");
+		return NULL;
+	}
+
+	rq = entity->rq;
+	spin_lock(&rq->lock);
+	sched = rq->sched;
+
+	if (list_empty(&entity->list)) {
+		atomic_inc(sched->score);
+		list_add_tail(&entity->list, &rq->entities);
+	}
+
+	if (drm_sched_policy == DRM_SCHED_POLICY_RR)
+		ts = drm_sched_rq_get_rr_deadline(rq);
+	drm_sched_rq_update_fifo_locked(entity, rq, ts);
+
+	spin_unlock(&rq->lock);
+	spin_unlock(&entity->lock);
+
+	return sched;
+}
+
+/**
+ * drm_sched_rq_remove_entity - remove an entity
+ *
+ * @rq: scheduler run queue
+ * @entity: scheduler entity
+ *
+ * Removes a scheduler entity from the run queue.
+ */
+void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
+				struct drm_sched_entity *entity)
+{
+	lockdep_assert_held(&entity->lock);
+
+	if (list_empty(&entity->list))
+		return;
+
+	spin_lock(&rq->lock);
+
+	atomic_dec(rq->sched->score);
+	list_del_init(&entity->list);
+
+	drm_sched_rq_remove_fifo_locked(entity, rq);
+
+	spin_unlock(&rq->lock);
+}
+
+/**
+ * drm_sched_rq_pop_entity - pops an entity
+ *
+ * @entity: scheduler entity
+ *
+ * To be called every time after a job is popped from the entity.
+ */
+void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
+{
+	struct drm_sched_job *next_job;
+	struct drm_sched_rq *rq;
+	ktime_t ts;
+
+	/*
+	 * Update the entity's location in the min heap according to
+	 * the timestamp of the next job, if any.
+	 */
+	next_job = drm_sched_entity_queue_peek(entity);
+	if (!next_job)
+		return;
+
+	spin_lock(&entity->lock);
+	rq = entity->rq;
+	spin_lock(&rq->lock);
+	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+		ts = next_job->submit_ts;
+	else
+		ts = drm_sched_rq_get_rr_deadline(rq);
+	drm_sched_rq_update_fifo_locked(entity, rq, ts);
+	spin_unlock(&rq->lock);
+	spin_unlock(&entity->lock);
+}
+
+/**
+ * drm_sched_rq_select_entity - Select an entity which provides a job to run
+ *
+ * @sched: the gpu scheduler
+ * @rq: scheduler run queue to check.
+ *
+ * Find oldest waiting ready entity.
+ *
+ * Return an entity if one is found; return an error-pointer (!NULL) if an
+ * entity was ready, but the scheduler had insufficient credits to accommodate
+ * its job; return NULL, if no ready entity was found.
+ */
+struct drm_sched_entity *
+drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
+			   struct drm_sched_rq *rq)
+{
+	struct rb_node *rb;
+
+	spin_lock(&rq->lock);
+	for (rb = rb_first_cached(&rq->rb_tree_root); rb; rb = rb_next(rb)) {
+		struct drm_sched_entity *entity;
+
+		entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
+		if (drm_sched_entity_is_ready(entity)) {
+			/* If we can't queue yet, preserve the current entity in
+			 * terms of fairness.
+			 */
+			if (!drm_sched_can_queue(sched, entity)) {
+				spin_unlock(&rq->lock);
+				return ERR_PTR(-ENOSPC);
+			}
+
+			reinit_completion(&entity->entity_idle);
+			break;
+		}
+	}
+	spin_unlock(&rq->lock);
+
+	return rb ? rb_entry(rb, struct drm_sched_entity, rb_tree_node) : NULL;
+}
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 09/15] drm/sched: Free all finished jobs at once
  2025-07-08  9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
                   ` (7 preceding siblings ...)
  2025-07-08  9:51 ` [PATCH v6 08/15] drm/sched: Move run queue related code into a separate file Tvrtko Ursulin
@ 2025-07-08  9:51 ` Tvrtko Ursulin
  2025-07-08  9:51 ` [PATCH v6 10/15] drm/sched: Account entity GPU time Tvrtko Ursulin
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08  9:51 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Tvrtko Ursulin,
	Christian König, Danilo Krummrich, Matthew Brost,
	Philipp Stanner

To implement fair scheduling we will need as accurate as possible view
into per entity GPU time utilisation. Because sched fence execution time
are only adjusted for accuracy in the free worker we need to process
completed jobs as soon as possible so the metric is most up to date when
view from the submission side of things.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/sched_main.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index c906c2f3e6d5..f87bb4681b93 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -870,13 +870,12 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
  * drm_sched_get_finished_job - fetch the next finished job to be destroyed
  *
  * @sched: scheduler instance
- * @have_more: are there more finished jobs on the list
  *
  * Returns the next finished job from the pending list (if there is one)
  * ready for it to be destroyed.
  */
 static struct drm_sched_job *
-drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool *have_more)
+drm_sched_get_finished_job(struct drm_gpu_scheduler *sched)
 {
 	struct drm_sched_job *job, *next;
 
@@ -891,7 +890,6 @@ drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool *have_more)
 		/* cancel this job's TO timer */
 		cancel_delayed_work(&sched->work_tdr);
 
-		*have_more = false;
 		next = list_first_entry_or_null(&sched->pending_list,
 						typeof(*next), list);
 		if (next) {
@@ -901,8 +899,6 @@ drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool *have_more)
 				next->s_fence->scheduled.timestamp =
 					dma_fence_timestamp(&job->s_fence->finished);
 
-			*have_more = dma_fence_is_signaled(&next->s_fence->finished);
-
 			/* start TO timer for next job */
 			drm_sched_start_timeout(sched);
 		}
@@ -961,14 +957,9 @@ static void drm_sched_free_job_work(struct work_struct *w)
 	struct drm_gpu_scheduler *sched =
 		container_of(w, struct drm_gpu_scheduler, work_free_job);
 	struct drm_sched_job *job;
-	bool have_more;
 
-	job = drm_sched_get_finished_job(sched, &have_more);
-	if (job) {
+	while ((job = drm_sched_get_finished_job(sched)))
 		sched->ops->free_job(job);
-		if (have_more)
-			__drm_sched_run_free_queue(sched);
-	}
 
 	drm_sched_run_job_queue(sched);
 }
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 10/15] drm/sched: Account entity GPU time
  2025-07-08  9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
                   ` (8 preceding siblings ...)
  2025-07-08  9:51 ` [PATCH v6 09/15] drm/sched: Free all finished jobs at once Tvrtko Ursulin
@ 2025-07-08  9:51 ` Tvrtko Ursulin
  2025-07-08  9:51 ` [PATCH v6 11/15] drm/sched: Remove idle entity from tree Tvrtko Ursulin
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08  9:51 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Tvrtko Ursulin,
	Christian König, Danilo Krummrich, Matthew Brost,
	Philipp Stanner

To implement fair scheduling we need a view into the GPU time consumed by
entities. Problem we have is that jobs and entities objects have decoupled
lifetimes, where at the point we have a view into accurate GPU time, we
cannot link back to the entity any longer.

Solve this by adding a light weight entity stats object which is reference
counted by both entity and the job and hence can safely be used from
either side.

With that, the only other thing we need is to add a helper for adding the
job's GPU time into the respective entity stats object, and call it once
the accurate GPU time has been calculated.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/sched_entity.c   | 39 +++++++++++++
 drivers/gpu/drm/scheduler/sched_internal.h | 66 ++++++++++++++++++++++
 drivers/gpu/drm/scheduler/sched_main.c     |  6 +-
 include/drm/gpu_scheduler.h                | 12 ++++
 4 files changed, 122 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index e42526aa22dc..99a71fb388c3 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -32,6 +32,39 @@
 
 #include "gpu_scheduler_trace.h"
 
+
+/**
+ * drm_sched_entity_stats_release - Entity stats kref release function
+ *
+ * @kref: Entity stats embedded kref pointer
+ */
+void drm_sched_entity_stats_release(struct kref *kref)
+{
+	struct drm_sched_entity_stats *stats =
+		container_of(kref, typeof(*stats), kref);
+
+	kfree(stats);
+}
+
+/**
+ * drm_sched_entity_stats_alloc - Allocate a new struct drm_sched_entity_stats object
+ *
+ * Returns: Pointer to newly allocated struct drm_sched_entity_stats object.
+ */
+static struct drm_sched_entity_stats *drm_sched_entity_stats_alloc(void)
+{
+	struct drm_sched_entity_stats *stats;
+
+	stats = kzalloc(sizeof(*stats), GFP_KERNEL);
+	if (!stats)
+		return NULL;
+
+	kref_init(&stats->kref);
+	spin_lock_init(&stats->lock);
+
+	return stats;
+}
+
 /**
  * drm_sched_entity_init - Init a context entity used by scheduler when
  * submit to HW ring.
@@ -65,6 +98,11 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 		return -EINVAL;
 
 	memset(entity, 0, sizeof(struct drm_sched_entity));
+
+	entity->stats = drm_sched_entity_stats_alloc();
+	if (!entity->stats)
+		return -ENOMEM;
+
 	INIT_LIST_HEAD(&entity->list);
 	entity->rq = NULL;
 	entity->guilty = guilty;
@@ -340,6 +378,7 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity)
 
 	dma_fence_put(rcu_dereference_check(entity->last_scheduled, true));
 	RCU_INIT_POINTER(entity->last_scheduled, NULL);
+	drm_sched_entity_stats_put(entity->stats);
 }
 EXPORT_SYMBOL(drm_sched_entity_fini);
 
diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
index 703ee48fbc58..27c8460a3601 100644
--- a/drivers/gpu/drm/scheduler/sched_internal.h
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -3,6 +3,22 @@
 #ifndef _DRM_GPU_SCHEDULER_INTERNAL_H_
 #define _DRM_GPU_SCHEDULER_INTERNAL_H_
 
+#include <linux/ktime.h>
+#include <linux/kref.h>
+#include <linux/spinlock.h>
+
+/**
+ * struct drm_sched_entity_stats - execution stats for an entity.
+ *
+ * @kref: reference count for the object.
+ * @lock: lock guarding the @runtime updates.
+ * @runtime: time entity spent on the GPU.
+ */
+struct drm_sched_entity_stats {
+	struct kref	kref;
+	spinlock_t	lock;
+	ktime_t		runtime;
+};
 
 /* Used to choose between FIFO and RR job-scheduling */
 extern int drm_sched_policy;
@@ -93,4 +109,54 @@ drm_sched_entity_is_ready(struct drm_sched_entity *entity)
 	return true;
 }
 
+void drm_sched_entity_stats_release(struct kref *kref);
+
+/**
+ * drm_sched_entity_stats_get - Obtain a reference count on struct drm_sched_entity_stats object
+ *
+ * @stats: struct drm_sched_entity_stats pointer
+ *
+ * Returns: struct drm_sched_entity_stats pointer
+ */
+static inline struct drm_sched_entity_stats *
+drm_sched_entity_stats_get(struct drm_sched_entity_stats *stats)
+{
+	kref_get(&stats->kref);
+
+	return stats;
+}
+
+/**
+ * drm_sched_entity_stats_put - Release a reference count on struct drm_sched_entity_stats object
+ *
+ * @stats: struct drm_sched_entity_stats pointer
+ */
+static inline void
+drm_sched_entity_stats_put(struct drm_sched_entity_stats *stats)
+{
+	kref_put(&stats->kref, drm_sched_entity_stats_release);
+}
+
+/**
+ * drm_sched_entity_stats_job_add_gpu_time - Account job execution time to entity
+ *
+ * @job: Scheduler job to account.
+ *
+ * Accounts the execution time of @job to its respective entity stats object.
+ */
+static inline void
+drm_sched_entity_stats_job_add_gpu_time(struct drm_sched_job *job)
+{
+	struct drm_sched_entity_stats *stats = job->entity_stats;
+	struct drm_sched_fence *s_fence = job->s_fence;
+	ktime_t start, end;
+
+	start = dma_fence_timestamp(&s_fence->scheduled);
+	end = dma_fence_timestamp(&s_fence->finished);
+
+	spin_lock(&stats->lock);
+	stats->runtime = ktime_add(stats->runtime, ktime_sub(end, start));
+	spin_unlock(&stats->lock);
+}
+
 #endif
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index f87bb4681b93..22cc0dd536db 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -620,6 +620,7 @@ void drm_sched_job_arm(struct drm_sched_job *job)
 
 	job->sched = sched;
 	job->s_priority = entity->priority;
+	job->entity_stats = drm_sched_entity_stats_get(entity->stats);
 
 	drm_sched_fence_init(job->s_fence, job->entity);
 }
@@ -810,6 +811,7 @@ void drm_sched_job_cleanup(struct drm_sched_job *job)
 		 * been called.
 		 */
 		dma_fence_put(&job->s_fence->finished);
+		drm_sched_entity_stats_put(job->entity_stats);
 	} else {
 		/* The job was aborted before it has been committed to be run;
 		 * notably, drm_sched_job_arm() has not been called.
@@ -958,8 +960,10 @@ static void drm_sched_free_job_work(struct work_struct *w)
 		container_of(w, struct drm_gpu_scheduler, work_free_job);
 	struct drm_sched_job *job;
 
-	while ((job = drm_sched_get_finished_job(sched)))
+	while ((job = drm_sched_get_finished_job(sched))) {
+		drm_sched_entity_stats_job_add_gpu_time(job);
 		sched->ops->free_job(job);
+	}
 
 	drm_sched_run_job_queue(sched);
 }
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 74356aa5b6fb..bd7f7c6dd744 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -71,6 +71,8 @@ enum drm_sched_priority {
 	DRM_SCHED_PRIORITY_COUNT
 };
 
+struct drm_sched_entity_stats;
+
 /**
  * struct drm_sched_entity - A wrapper around a job queue (typically
  * attached to the DRM file_priv).
@@ -109,6 +111,11 @@ struct drm_sched_entity {
 	 */
 	struct drm_sched_rq		*rq;
 
+	/**
+	 * @stats: Stats object reference held by the entity and jobs.
+	 */
+	struct drm_sched_entity_stats	*stats;
+
 	/**
 	 * @sched_list:
 	 *
@@ -357,6 +364,11 @@ struct drm_sched_job {
 	struct drm_sched_fence		*s_fence;
 	struct drm_sched_entity         *entity;
 
+	/**
+	 * @entity_stats: Stats object reference held by the job and entity.
+	 */
+	struct drm_sched_entity_stats	*entity_stats;
+
 	enum drm_sched_priority		s_priority;
 	u32				credits;
 	/** @last_dependency: tracks @dependencies as they signal */
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 11/15] drm/sched: Remove idle entity from tree
  2025-07-08  9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
                   ` (9 preceding siblings ...)
  2025-07-08  9:51 ` [PATCH v6 10/15] drm/sched: Account entity GPU time Tvrtko Ursulin
@ 2025-07-08  9:51 ` Tvrtko Ursulin
  2025-07-08  9:51 ` [PATCH v6 12/15] drm/sched: Add fair scheduling policy Tvrtko Ursulin
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08  9:51 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Tvrtko Ursulin,
	Christian König, Danilo Krummrich, Matthew Brost,
	Philipp Stanner

There is no need to keep entities with no jobs in the tree so lets remove
it once the last job is consumed. This keeps the tree smaller which is
nicer and more efficient as entities are removed and re-added on every
popped job.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/sched_rq.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
index 0492d1f21572..9f1b4ea99cad 100644
--- a/drivers/gpu/drm/scheduler/sched_rq.c
+++ b/drivers/gpu/drm/scheduler/sched_rq.c
@@ -19,6 +19,9 @@ drm_sched_entity_compare_before(struct rb_node *a, const struct rb_node *b)
 static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
 					    struct drm_sched_rq *rq)
 {
+	lockdep_assert_held(&entity->lock);
+	lockdep_assert_held(&rq->lock);
+
 	if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
 		rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
 		RB_CLEAR_NODE(&entity->rb_tree_node);
@@ -153,24 +156,27 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
 {
 	struct drm_sched_job *next_job;
 	struct drm_sched_rq *rq;
-	ktime_t ts;
 
 	/*
 	 * Update the entity's location in the min heap according to
 	 * the timestamp of the next job, if any.
 	 */
+	spin_lock(&entity->lock);
+	rq = entity->rq;
+	spin_lock(&rq->lock);
 	next_job = drm_sched_entity_queue_peek(entity);
-	if (!next_job)
-		return;
+	if (next_job) {
+		ktime_t ts;
 
-	spin_lock(&entity->lock);
-	rq = entity->rq;
-	spin_lock(&rq->lock);
-	if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-		ts = next_job->submit_ts;
-	else
-		ts = drm_sched_rq_get_rr_deadline(rq);
-	drm_sched_rq_update_fifo_locked(entity, rq, ts);
+		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+			ts = next_job->submit_ts;
+		else
+			ts = drm_sched_rq_get_rr_deadline(rq);
+
+		drm_sched_rq_update_fifo_locked(entity, rq, ts);
+	} else {
+		drm_sched_rq_remove_fifo_locked(entity, rq);
+	}
 	spin_unlock(&rq->lock);
 	spin_unlock(&entity->lock);
 }
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 12/15] drm/sched: Add fair scheduling policy
  2025-07-08  9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
                   ` (10 preceding siblings ...)
  2025-07-08  9:51 ` [PATCH v6 11/15] drm/sched: Remove idle entity from tree Tvrtko Ursulin
@ 2025-07-08  9:51 ` Tvrtko Ursulin
  2025-07-08  9:51 ` [PATCH v6 13/15] drm/sched: Remove FIFO and RR and simplify to a single run queue Tvrtko Ursulin
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08  9:51 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Tvrtko Ursulin,
	Christian König, Danilo Krummrich, Matthew Brost,
	Philipp Stanner, Pierre-Eric Pelloux-Prayer

Fair scheduling policy is built upon the same concepts as the well known
CFS kernel scheduler - entity run queue is sorted by the virtual GPU time
consumed by entities in a way that the entity with least vruntime runs
first.

It is able to avoid total priority starvation, which is one of the
problems with FIFO, and it also eliminates the need for per priority run
queues. As it scales the actual GPU runtime by an exponential factor as
the priority decreases, therefore the virtual runtime for low priority
entities grows faster than for normal priority, pushing them further down
the runqueue order for the same real GPU time spent.

Apart from this fundamental fairness, fair policy is especially strong in
oversubscription workloads where it is able to give more GPU time to short
and bursty workloads when they are running in parallel with GPU heavy
clients submitting deep job queues.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
 drivers/gpu/drm/scheduler/sched_entity.c   | 28 +++++----
 drivers/gpu/drm/scheduler/sched_internal.h | 70 +++++++++++++++++++++-
 drivers/gpu/drm/scheduler/sched_main.c     | 14 +++--
 drivers/gpu/drm/scheduler/sched_rq.c       | 35 ++++++++++-
 include/drm/gpu_scheduler.h                |  3 +
 5 files changed, 131 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 99a71fb388c3..82b824436a60 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -108,6 +108,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 	entity->guilty = guilty;
 	entity->num_sched_list = num_sched_list;
 	entity->priority = priority;
+	entity->rq_priority = drm_sched_policy == DRM_SCHED_POLICY_FAIR ?
+			      DRM_SCHED_PRIORITY_KERNEL : priority;
 	/*
 	 * It's perfectly valid to initialize an entity without having a valid
 	 * scheduler attached. It's just not valid to use the scheduler before it
@@ -124,17 +126,23 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 		 */
 		pr_warn("%s: called with uninitialized scheduler\n", __func__);
 	} else if (num_sched_list) {
-		/* The "priority" of an entity cannot exceed the number of run-queues of a
-		 * scheduler. Protect against num_rqs being 0, by converting to signed. Choose
-		 * the lowest priority available.
+		enum drm_sched_priority p = entity->priority;
+
+		/*
+		 * The "priority" of an entity cannot exceed the number of
+		 * run-queues of a scheduler. Protect against num_rqs being 0,
+		 * by converting to signed. Choose the lowest priority
+		 * available.
 		 */
-		if (entity->priority >= sched_list[0]->num_rqs) {
-			dev_err(sched_list[0]->dev, "entity has out-of-bounds priority: %u. num_rqs: %u\n",
-				entity->priority, sched_list[0]->num_rqs);
-			entity->priority = max_t(s32, (s32) sched_list[0]->num_rqs - 1,
-						 (s32) DRM_SCHED_PRIORITY_KERNEL);
+		if (p >= sched_list[0]->num_user_rqs) {
+			dev_err(sched_list[0]->dev, "entity with out-of-bounds priority:%u num_user_rqs:%u\n",
+				p, sched_list[0]->num_user_rqs);
+			p = max_t(s32,
+				 (s32)sched_list[0]->num_user_rqs - 1,
+				 (s32)DRM_SCHED_PRIORITY_KERNEL);
+			entity->priority = p;
 		}
-		entity->rq = sched_list[0]->sched_rq[entity->priority];
+		entity->rq = sched_list[0]->sched_rq[entity->rq_priority];
 	}
 
 	init_completion(&entity->entity_idle);
@@ -583,7 +591,7 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
 
 	spin_lock(&entity->lock);
 	sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list);
-	rq = sched ? sched->sched_rq[entity->priority] : NULL;
+	rq = sched ? sched->sched_rq[entity->rq_priority] : NULL;
 	if (rq != entity->rq) {
 		drm_sched_rq_remove_entity(entity->rq, entity);
 		entity->rq = rq;
diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
index 27c8460a3601..6f91780c1d4e 100644
--- a/drivers/gpu/drm/scheduler/sched_internal.h
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -18,13 +18,16 @@ struct drm_sched_entity_stats {
 	struct kref	kref;
 	spinlock_t	lock;
 	ktime_t		runtime;
+	ktime_t		prev_runtime;
+	u64		vruntime;
 };
 
 /* Used to choose between FIFO and RR job-scheduling */
 extern int drm_sched_policy;
 
-#define DRM_SCHED_POLICY_RR    0
-#define DRM_SCHED_POLICY_FIFO  1
+#define DRM_SCHED_POLICY_RR	0
+#define DRM_SCHED_POLICY_FIFO	1
+#define DRM_SCHED_POLICY_FAIR	2
 
 bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
 			 struct drm_sched_entity *entity);
@@ -159,4 +162,67 @@ drm_sched_entity_stats_job_add_gpu_time(struct drm_sched_job *job)
 	spin_unlock(&stats->lock);
 }
 
+static inline void
+drm_sched_entity_save_vruntime(struct drm_sched_entity *entity,
+			       ktime_t min_vruntime)
+{
+	struct drm_sched_entity_stats *stats = entity->stats;
+	ktime_t vruntime;
+
+	spin_lock(&stats->lock);
+	vruntime = stats->vruntime;
+	if (min_vruntime && vruntime > min_vruntime)
+		vruntime = ktime_sub(vruntime, min_vruntime);
+	else
+		vruntime = 0;
+	stats->vruntime = vruntime;
+	spin_unlock(&stats->lock);
+}
+
+static inline ktime_t
+drm_sched_entity_restore_vruntime(struct drm_sched_entity *entity,
+				  ktime_t min_vruntime)
+{
+	struct drm_sched_entity_stats *stats = entity->stats;
+	ktime_t vruntime;
+
+	spin_lock(&stats->lock);
+	vruntime = ktime_add(min_vruntime, stats->vruntime);
+	stats->vruntime = vruntime;
+	spin_unlock(&stats->lock);
+
+	return vruntime;
+}
+
+static inline ktime_t
+drm_sched_entity_update_vruntime(struct drm_sched_entity *entity)
+{
+	static const unsigned int shift[] = {
+		[DRM_SCHED_PRIORITY_KERNEL] = 1,
+		[DRM_SCHED_PRIORITY_HIGH]   = 2,
+		[DRM_SCHED_PRIORITY_NORMAL] = 4,
+		[DRM_SCHED_PRIORITY_LOW]    = 7,
+	};
+	struct drm_sched_entity_stats *stats = entity->stats;
+	ktime_t runtime, prev;
+
+	spin_lock(&stats->lock);
+	prev = stats->prev_runtime;
+	runtime = stats->runtime;
+	stats->prev_runtime = runtime;
+	runtime = ktime_add_ns(stats->vruntime,
+			       ktime_to_ns(ktime_sub(runtime, prev)) <<
+			       shift[entity->priority]);
+	stats->vruntime = runtime;
+	spin_unlock(&stats->lock);
+
+	return runtime;
+}
+
+static inline ktime_t
+drm_sched_entity_get_job_ts(struct drm_sched_entity *entity)
+{
+	return drm_sched_entity_update_vruntime(entity);
+}
+
 #endif
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 22cc0dd536db..705049a1685a 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -84,13 +84,13 @@
 #define CREATE_TRACE_POINTS
 #include "gpu_scheduler_trace.h"
 
-int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
+int drm_sched_policy = DRM_SCHED_POLICY_FAIR;
 
 /**
  * DOC: sched_policy (int)
  * Used to override default entities scheduling policy in a run queue.
  */
-MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
+MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO, " __stringify(DRM_SCHED_POLICY_FAIR) " = Fair (default).");
 module_param_named(sched_policy, drm_sched_policy, int, 0444);
 
 static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
@@ -1093,11 +1093,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_
 		sched->own_submit_wq = true;
 	}
 
-	sched->sched_rq = kmalloc_array(args->num_rqs, sizeof(*sched->sched_rq),
+	sched->num_user_rqs = args->num_rqs;
+	sched->num_rqs = drm_sched_policy != DRM_SCHED_POLICY_FAIR ?
+			 args->num_rqs : 1;
+	sched->sched_rq = kmalloc_array(sched->num_rqs,
+					sizeof(*sched->sched_rq),
 					GFP_KERNEL | __GFP_ZERO);
 	if (!sched->sched_rq)
 		goto Out_check_own;
-	sched->num_rqs = args->num_rqs;
+
 	for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
 		sched->sched_rq[i] = kzalloc(sizeof(*sched->sched_rq[i]), GFP_KERNEL);
 		if (!sched->sched_rq[i])
@@ -1212,7 +1216,7 @@ void drm_sched_increase_karma(struct drm_sched_job *bad)
 	if (bad->s_priority != DRM_SCHED_PRIORITY_KERNEL) {
 		atomic_inc(&bad->karma);
 
-		for (i = DRM_SCHED_PRIORITY_HIGH; i < sched->num_rqs; i++) {
+		for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
 			struct drm_sched_rq *rq = sched->sched_rq[i];
 
 			spin_lock(&rq->lock);
diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
index 9f1b4ea99cad..a891c3231c4c 100644
--- a/drivers/gpu/drm/scheduler/sched_rq.c
+++ b/drivers/gpu/drm/scheduler/sched_rq.c
@@ -75,6 +75,23 @@ drm_sched_rq_get_rr_deadline(struct drm_sched_rq *rq)
 	return rq->rr_deadline;
 }
 
+static ktime_t
+drm_sched_rq_get_min_vruntime(struct drm_sched_rq *rq)
+{
+	struct drm_sched_entity *entity;
+	struct rb_node *rb;
+
+	lockdep_assert_held(&rq->lock);
+
+	for (rb = rb_first_cached(&rq->rb_tree_root); rb; rb = rb_next(rb)) {
+		entity = rb_entry(rb, typeof(*entity), rb_tree_node);
+
+		return entity->stats->vruntime; /* Unlocked read */
+	}
+
+	return 0;
+}
+
 /**
  * drm_sched_rq_add_entity - add an entity
  *
@@ -109,8 +126,13 @@ drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts)
 		list_add_tail(&entity->list, &rq->entities);
 	}
 
-	if (drm_sched_policy == DRM_SCHED_POLICY_RR)
+	if (drm_sched_policy == DRM_SCHED_POLICY_FAIR) {
+		ts = drm_sched_rq_get_min_vruntime(rq);
+		ts = drm_sched_entity_restore_vruntime(entity, ts);
+	} else if (drm_sched_policy == DRM_SCHED_POLICY_RR) {
 		ts = drm_sched_rq_get_rr_deadline(rq);
+	}
+
 	drm_sched_rq_update_fifo_locked(entity, rq, ts);
 
 	spin_unlock(&rq->lock);
@@ -168,7 +190,9 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
 	if (next_job) {
 		ktime_t ts;
 
-		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+		if (drm_sched_policy == DRM_SCHED_POLICY_FAIR)
+			ts = drm_sched_entity_get_job_ts(entity);
+		else if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
 			ts = next_job->submit_ts;
 		else
 			ts = drm_sched_rq_get_rr_deadline(rq);
@@ -176,6 +200,13 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
 		drm_sched_rq_update_fifo_locked(entity, rq, ts);
 	} else {
 		drm_sched_rq_remove_fifo_locked(entity, rq);
+
+		if (drm_sched_policy == DRM_SCHED_POLICY_FAIR) {
+			ktime_t min_vruntime;
+
+			min_vruntime = drm_sched_rq_get_min_vruntime(rq);
+			drm_sched_entity_save_vruntime(entity, min_vruntime);
+		}
 	}
 	spin_unlock(&rq->lock);
 	spin_unlock(&entity->lock);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index bd7f7c6dd744..aef86b78a901 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -154,6 +154,8 @@ struct drm_sched_entity {
 	 */
 	struct spsc_queue		job_queue;
 
+	enum drm_sched_priority         rq_priority;
+
 	/**
 	 * @fence_seq:
 	 *
@@ -568,6 +570,7 @@ struct drm_gpu_scheduler {
 	long				timeout;
 	const char			*name;
 	u32                             num_rqs;
+	u32                             num_user_rqs;
 	struct drm_sched_rq             **sched_rq;
 	wait_queue_head_t		job_scheduled;
 	atomic64_t			job_id_count;
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 13/15] drm/sched: Remove FIFO and RR and simplify to a single run queue
  2025-07-08  9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
                   ` (11 preceding siblings ...)
  2025-07-08  9:51 ` [PATCH v6 12/15] drm/sched: Add fair scheduling policy Tvrtko Ursulin
@ 2025-07-08  9:51 ` Tvrtko Ursulin
  2025-07-08  9:51 ` [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation Tvrtko Ursulin
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08  9:51 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Tvrtko Ursulin,
	Christian König, Danilo Krummrich, Matthew Brost,
	Philipp Stanner

If the new fair policy is at least as good as FIFO and we can afford to
remove round-robin, we can simplify the scheduler code by making the
scheduler to run queue relationship always 1:1 and remove some code.

Also, now that the FIFO policy is gone the tree of entities is not a FIFO
tree any more so rename it to just the tree.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  23 ++--
 drivers/gpu/drm/scheduler/sched_entity.c   |  29 +----
 drivers/gpu/drm/scheduler/sched_internal.h |   9 +-
 drivers/gpu/drm/scheduler/sched_main.c     | 133 +++++----------------
 drivers/gpu/drm/scheduler/sched_rq.c       |  53 +++-----
 include/drm/gpu_scheduler.h                |  13 +-
 6 files changed, 62 insertions(+), 198 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 2b58e353cca1..c9534cc439b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -438,25 +438,22 @@ drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
 
 void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler *sched)
 {
+	struct drm_sched_rq *rq = sched->rq;
+	struct drm_sched_entity *s_entity;
 	struct drm_sched_job *s_job;
-	struct drm_sched_entity *s_entity = NULL;
-	int i;
 
 	/* Signal all jobs not yet scheduled */
-	for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
-		struct drm_sched_rq *rq = sched->sched_rq[i];
-		spin_lock(&rq->lock);
-		list_for_each_entry(s_entity, &rq->entities, list) {
-			while ((s_job = drm_sched_entity_queue_pop(s_entity))) {
-				struct drm_sched_fence *s_fence = s_job->s_fence;
+	spin_lock(&rq->lock);
+	list_for_each_entry(s_entity, &rq->entities, list) {
+		while ((s_job = drm_sched_entity_queue_pop(s_entity))) {
+			struct drm_sched_fence *s_fence = s_job->s_fence;
 
-				dma_fence_signal(&s_fence->scheduled);
-				dma_fence_set_error(&s_fence->finished, -EHWPOISON);
-				dma_fence_signal(&s_fence->finished);
-			}
+			dma_fence_signal(&s_fence->scheduled);
+			dma_fence_set_error(&s_fence->finished, -EHWPOISON);
+			dma_fence_signal(&s_fence->finished);
 		}
-		spin_unlock(&rq->lock);
 	}
+	spin_unlock(&rq->lock);
 
 	/* Signal all jobs already scheduled to HW */
 	list_for_each_entry(s_job, &sched->pending_list, list) {
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 82b824436a60..30dba2409ea4 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -108,8 +108,6 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 	entity->guilty = guilty;
 	entity->num_sched_list = num_sched_list;
 	entity->priority = priority;
-	entity->rq_priority = drm_sched_policy == DRM_SCHED_POLICY_FAIR ?
-			      DRM_SCHED_PRIORITY_KERNEL : priority;
 	/*
 	 * It's perfectly valid to initialize an entity without having a valid
 	 * scheduler attached. It's just not valid to use the scheduler before it
@@ -119,30 +117,14 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 	RCU_INIT_POINTER(entity->last_scheduled, NULL);
 	RB_CLEAR_NODE(&entity->rb_tree_node);
 
-	if (num_sched_list && !sched_list[0]->sched_rq) {
+	if (num_sched_list && !sched_list[0]->rq) {
 		/* Since every entry covered by num_sched_list
 		 * should be non-NULL and therefore we warn drivers
 		 * not to do this and to fix their DRM calling order.
 		 */
 		pr_warn("%s: called with uninitialized scheduler\n", __func__);
 	} else if (num_sched_list) {
-		enum drm_sched_priority p = entity->priority;
-
-		/*
-		 * The "priority" of an entity cannot exceed the number of
-		 * run-queues of a scheduler. Protect against num_rqs being 0,
-		 * by converting to signed. Choose the lowest priority
-		 * available.
-		 */
-		if (p >= sched_list[0]->num_user_rqs) {
-			dev_err(sched_list[0]->dev, "entity with out-of-bounds priority:%u num_user_rqs:%u\n",
-				p, sched_list[0]->num_user_rqs);
-			p = max_t(s32,
-				 (s32)sched_list[0]->num_user_rqs - 1,
-				 (s32)DRM_SCHED_PRIORITY_KERNEL);
-			entity->priority = p;
-		}
-		entity->rq = sched_list[0]->sched_rq[entity->rq_priority];
+		entity->rq = sched_list[0]->rq;
 	}
 
 	init_completion(&entity->entity_idle);
@@ -591,7 +573,7 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
 
 	spin_lock(&entity->lock);
 	sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list);
-	rq = sched ? sched->sched_rq[entity->rq_priority] : NULL;
+	rq = sched ? sched->rq : NULL;
 	if (rq != entity->rq) {
 		drm_sched_rq_remove_entity(entity->rq, entity);
 		entity->rq = rq;
@@ -615,7 +597,6 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 {
 	struct drm_sched_entity *entity = sched_job->entity;
 	bool first;
-	ktime_t submit_ts;
 
 	trace_drm_sched_job_queue(sched_job, entity);
 
@@ -632,16 +613,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 	/*
 	 * After the sched_job is pushed into the entity queue, it may be
 	 * completed and freed up at any time. We can no longer access it.
-	 * Make sure to set the submit_ts first, to avoid a race.
 	 */
-	sched_job->submit_ts = submit_ts = ktime_get();
 	first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node);
 
 	/* first job wakes up scheduler */
 	if (first) {
 		struct drm_gpu_scheduler *sched;
 
-		sched = drm_sched_rq_add_entity(entity, submit_ts);
+		sched = drm_sched_rq_add_entity(entity);
 		if (sched)
 			drm_sched_wakeup(sched);
 	}
diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
index 6f91780c1d4e..15d78abc48df 100644
--- a/drivers/gpu/drm/scheduler/sched_internal.h
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -22,13 +22,6 @@ struct drm_sched_entity_stats {
 	u64		vruntime;
 };
 
-/* Used to choose between FIFO and RR job-scheduling */
-extern int drm_sched_policy;
-
-#define DRM_SCHED_POLICY_RR	0
-#define DRM_SCHED_POLICY_FIFO	1
-#define DRM_SCHED_POLICY_FAIR	2
-
 bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
 			 struct drm_sched_entity *entity);
 void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
@@ -39,7 +32,7 @@ struct drm_sched_entity *
 drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
 			   struct drm_sched_rq *rq);
 struct drm_gpu_scheduler *
-drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts);
+drm_sched_rq_add_entity(struct drm_sched_entity *entity);
 void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
 				struct drm_sched_entity *entity);
 void drm_sched_rq_pop_entity(struct drm_sched_entity *entity);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 705049a1685a..35025edea669 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -84,15 +84,6 @@
 #define CREATE_TRACE_POINTS
 #include "gpu_scheduler_trace.h"
 
-int drm_sched_policy = DRM_SCHED_POLICY_FAIR;
-
-/**
- * DOC: sched_policy (int)
- * Used to override default entities scheduling policy in a run queue.
- */
-MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO, " __stringify(DRM_SCHED_POLICY_FAIR) " = Fair (default).");
-module_param_named(sched_policy, drm_sched_policy, int, 0444);
-
 static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
 {
 	u32 credits;
@@ -840,34 +831,6 @@ void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
 	drm_sched_run_job_queue(sched);
 }
 
-/**
- * drm_sched_select_entity - Select next entity to process
- *
- * @sched: scheduler instance
- *
- * Return an entity to process or NULL if none are found.
- *
- * Note, that we break out of the for-loop when "entity" is non-null, which can
- * also be an error-pointer--this assures we don't process lower priority
- * run-queues. See comments in the respectively called functions.
- */
-static struct drm_sched_entity *
-drm_sched_select_entity(struct drm_gpu_scheduler *sched)
-{
-	struct drm_sched_entity *entity = NULL;
-	int i;
-
-	/* Start with the highest priority.
-	 */
-	for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
-		entity = drm_sched_rq_select_entity(sched, sched->sched_rq[i]);
-		if (entity)
-			break;
-	}
-
-	return IS_ERR(entity) ? NULL : entity;
-}
-
 /**
  * drm_sched_get_finished_job - fetch the next finished job to be destroyed
  *
@@ -984,8 +947,8 @@ static void drm_sched_run_job_work(struct work_struct *w)
 	int r;
 
 	/* Find entity with a ready job */
-	entity = drm_sched_select_entity(sched);
-	if (!entity)
+	entity = drm_sched_rq_select_entity(sched, sched->rq);
+	if (IS_ERR_OR_NULL(entity))
 		return;	/* No more work */
 
 	sched_job = drm_sched_entity_pop_job(entity);
@@ -1056,8 +1019,6 @@ static struct workqueue_struct *drm_sched_alloc_wq(const char *name)
  */
 int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_args *args)
 {
-	int i;
-
 	sched->ops = args->ops;
 	sched->credit_limit = args->credit_limit;
 	sched->name = args->name;
@@ -1067,13 +1028,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_
 	sched->score = args->score ? args->score : &sched->_score;
 	sched->dev = args->dev;
 
-	if (args->num_rqs > DRM_SCHED_PRIORITY_COUNT) {
-		/* This is a gross violation--tell drivers what the  problem is.
-		 */
-		dev_err(sched->dev, "%s: num_rqs cannot be greater than DRM_SCHED_PRIORITY_COUNT\n",
-			__func__);
-		return -EINVAL;
-	} else if (sched->sched_rq) {
+	if (sched->rq) {
 		/* Not an error, but warn anyway so drivers can
 		 * fine-tune their DRM calling order, and return all
 		 * is good.
@@ -1093,21 +1048,11 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_
 		sched->own_submit_wq = true;
 	}
 
-	sched->num_user_rqs = args->num_rqs;
-	sched->num_rqs = drm_sched_policy != DRM_SCHED_POLICY_FAIR ?
-			 args->num_rqs : 1;
-	sched->sched_rq = kmalloc_array(sched->num_rqs,
-					sizeof(*sched->sched_rq),
-					GFP_KERNEL | __GFP_ZERO);
-	if (!sched->sched_rq)
+	sched->rq = kmalloc(sizeof(*sched->rq), GFP_KERNEL | __GFP_ZERO);
+	if (!sched->rq)
 		goto Out_check_own;
 
-	for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
-		sched->sched_rq[i] = kzalloc(sizeof(*sched->sched_rq[i]), GFP_KERNEL);
-		if (!sched->sched_rq[i])
-			goto Out_unroll;
-		drm_sched_rq_init(sched, sched->sched_rq[i]);
-	}
+	drm_sched_rq_init(sched, sched->rq);
 
 	init_waitqueue_head(&sched->job_scheduled);
 	INIT_LIST_HEAD(&sched->pending_list);
@@ -1122,12 +1067,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_
 
 	sched->ready = true;
 	return 0;
-Out_unroll:
-	for (--i ; i >= DRM_SCHED_PRIORITY_KERNEL; i--)
-		kfree(sched->sched_rq[i]);
 
-	kfree(sched->sched_rq);
-	sched->sched_rq = NULL;
 Out_check_own:
 	if (sched->own_submit_wq)
 		destroy_workqueue(sched->submit_wq);
@@ -1159,25 +1099,21 @@ EXPORT_SYMBOL(drm_sched_init);
  */
 void drm_sched_fini(struct drm_gpu_scheduler *sched)
 {
+
+	struct drm_sched_rq *rq = sched->rq;
 	struct drm_sched_entity *s_entity;
-	int i;
 
 	drm_sched_wqueue_stop(sched);
 
-	for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
-		struct drm_sched_rq *rq = sched->sched_rq[i];
-
-		spin_lock(&rq->lock);
-		list_for_each_entry(s_entity, &rq->entities, list)
-			/*
-			 * Prevents reinsertion and marks job_queue as idle,
-			 * it will be removed from the rq in drm_sched_entity_fini()
-			 * eventually
-			 */
-			s_entity->stopped = true;
-		spin_unlock(&rq->lock);
-		kfree(sched->sched_rq[i]);
-	}
+	spin_lock(&rq->lock);
+	list_for_each_entry(s_entity, &rq->entities, list)
+		/*
+		 * Prevents reinsertion and marks job_queue as idle,
+		 * it will be removed from the rq in drm_sched_entity_fini()
+		 * eventually
+		 */
+		s_entity->stopped = true;
+	spin_unlock(&rq->lock);
 
 	/* Wakeup everyone stuck in drm_sched_entity_flush for this scheduler */
 	wake_up_all(&sched->job_scheduled);
@@ -1188,8 +1124,8 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
 	if (sched->own_submit_wq)
 		destroy_workqueue(sched->submit_wq);
 	sched->ready = false;
-	kfree(sched->sched_rq);
-	sched->sched_rq = NULL;
+	kfree(sched->rq);
+	sched->rq = NULL;
 }
 EXPORT_SYMBOL(drm_sched_fini);
 
@@ -1204,35 +1140,28 @@ EXPORT_SYMBOL(drm_sched_fini);
  */
 void drm_sched_increase_karma(struct drm_sched_job *bad)
 {
-	int i;
-	struct drm_sched_entity *tmp;
-	struct drm_sched_entity *entity;
 	struct drm_gpu_scheduler *sched = bad->sched;
+	struct drm_sched_entity *entity, *tmp;
+	struct drm_sched_rq *rq = sched->rq;
 
 	/* don't change @bad's karma if it's from KERNEL RQ,
 	 * because sometimes GPU hang would cause kernel jobs (like VM updating jobs)
 	 * corrupt but keep in mind that kernel jobs always considered good.
 	 */
-	if (bad->s_priority != DRM_SCHED_PRIORITY_KERNEL) {
-		atomic_inc(&bad->karma);
+	if (bad->s_priority == DRM_SCHED_PRIORITY_KERNEL)
+		return;
 
-		for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
-			struct drm_sched_rq *rq = sched->sched_rq[i];
+	atomic_inc(&bad->karma);
 
-			spin_lock(&rq->lock);
-			list_for_each_entry_safe(entity, tmp, &rq->entities, list) {
-				if (bad->s_fence->scheduled.context ==
-				    entity->fence_context) {
-					if (entity->guilty)
-						atomic_set(entity->guilty, 1);
-					break;
-				}
-			}
-			spin_unlock(&rq->lock);
-			if (&entity->list != &rq->entities)
-				break;
+	spin_lock(&rq->lock);
+	list_for_each_entry_safe(entity, tmp, &rq->entities, list) {
+		if (bad->s_fence->scheduled.context == entity->fence_context) {
+			if (entity->guilty)
+				atomic_set(entity->guilty, 1);
+			break;
 		}
 	}
+	spin_unlock(&rq->lock);
 }
 EXPORT_SYMBOL(drm_sched_increase_karma);
 
diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
index a891c3231c4c..e22f9ff88822 100644
--- a/drivers/gpu/drm/scheduler/sched_rq.c
+++ b/drivers/gpu/drm/scheduler/sched_rq.c
@@ -16,7 +16,7 @@ drm_sched_entity_compare_before(struct rb_node *a, const struct rb_node *b)
 	return ktime_before(ea->oldest_job_waiting, eb->oldest_job_waiting);
 }
 
-static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
+static void drm_sched_rq_remove_tree_locked(struct drm_sched_entity *entity,
 					    struct drm_sched_rq *rq)
 {
 	lockdep_assert_held(&entity->lock);
@@ -28,7 +28,7 @@ static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
 	}
 }
 
-static void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+static void drm_sched_rq_update_tree_locked(struct drm_sched_entity *entity,
 					    struct drm_sched_rq *rq,
 					    ktime_t ts)
 {
@@ -40,7 +40,7 @@ static void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
 	lockdep_assert_held(&entity->lock);
 	lockdep_assert_held(&rq->lock);
 
-	drm_sched_rq_remove_fifo_locked(entity, rq);
+	drm_sched_rq_remove_tree_locked(entity, rq);
 
 	entity->oldest_job_waiting = ts;
 
@@ -65,16 +65,6 @@ void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
 	rq->sched = sched;
 }
 
-static ktime_t
-drm_sched_rq_get_rr_deadline(struct drm_sched_rq *rq)
-{
-	lockdep_assert_held(&rq->lock);
-
-	rq->rr_deadline = ktime_add_ns(rq->rr_deadline, 1);
-
-	return rq->rr_deadline;
-}
-
 static ktime_t
 drm_sched_rq_get_min_vruntime(struct drm_sched_rq *rq)
 {
@@ -103,10 +93,11 @@ drm_sched_rq_get_min_vruntime(struct drm_sched_rq *rq)
  * Returns a DRM scheduler pre-selected to handle this entity.
  */
 struct drm_gpu_scheduler *
-drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts)
+drm_sched_rq_add_entity(struct drm_sched_entity *entity)
 {
 	struct drm_gpu_scheduler *sched;
 	struct drm_sched_rq *rq;
+	ktime_t ts;
 
 	/* Add the entity to the run queue */
 	spin_lock(&entity->lock);
@@ -126,14 +117,9 @@ drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts)
 		list_add_tail(&entity->list, &rq->entities);
 	}
 
-	if (drm_sched_policy == DRM_SCHED_POLICY_FAIR) {
-		ts = drm_sched_rq_get_min_vruntime(rq);
-		ts = drm_sched_entity_restore_vruntime(entity, ts);
-	} else if (drm_sched_policy == DRM_SCHED_POLICY_RR) {
-		ts = drm_sched_rq_get_rr_deadline(rq);
-	}
-
-	drm_sched_rq_update_fifo_locked(entity, rq, ts);
+	ts = drm_sched_rq_get_min_vruntime(rq);
+	ts = drm_sched_entity_restore_vruntime(entity, ts);
+	drm_sched_rq_update_tree_locked(entity, rq, ts);
 
 	spin_unlock(&rq->lock);
 	spin_unlock(&entity->lock);
@@ -162,7 +148,7 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
 	atomic_dec(rq->sched->score);
 	list_del_init(&entity->list);
 
-	drm_sched_rq_remove_fifo_locked(entity, rq);
+	drm_sched_rq_remove_tree_locked(entity, rq);
 
 	spin_unlock(&rq->lock);
 }
@@ -190,23 +176,14 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
 	if (next_job) {
 		ktime_t ts;
 
-		if (drm_sched_policy == DRM_SCHED_POLICY_FAIR)
-			ts = drm_sched_entity_get_job_ts(entity);
-		else if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-			ts = next_job->submit_ts;
-		else
-			ts = drm_sched_rq_get_rr_deadline(rq);
-
-		drm_sched_rq_update_fifo_locked(entity, rq, ts);
+		ts = drm_sched_entity_get_job_ts(entity);
+		drm_sched_rq_update_tree_locked(entity, rq, ts);
 	} else {
-		drm_sched_rq_remove_fifo_locked(entity, rq);
+		ktime_t min_vruntime;
 
-		if (drm_sched_policy == DRM_SCHED_POLICY_FAIR) {
-			ktime_t min_vruntime;
-
-			min_vruntime = drm_sched_rq_get_min_vruntime(rq);
-			drm_sched_entity_save_vruntime(entity, min_vruntime);
-		}
+		drm_sched_rq_remove_tree_locked(entity, rq);
+		min_vruntime = drm_sched_rq_get_min_vruntime(rq);
+		drm_sched_entity_save_vruntime(entity, min_vruntime);
 	}
 	spin_unlock(&rq->lock);
 	spin_unlock(&entity->lock);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index aef86b78a901..6947a577d249 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -154,8 +154,6 @@ struct drm_sched_entity {
 	 */
 	struct spsc_queue		job_queue;
 
-	enum drm_sched_priority         rq_priority;
-
 	/**
 	 * @fence_seq:
 	 *
@@ -347,13 +345,6 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
  * to schedule the job.
  */
 struct drm_sched_job {
-	/**
-	 * @submit_ts:
-	 *
-	 * When the job was pushed into the entity queue.
-	 */
-	ktime_t                         submit_ts;
-
 	/**
 	 * @sched:
 	 *
@@ -569,9 +560,7 @@ struct drm_gpu_scheduler {
 	atomic_t			credit_count;
 	long				timeout;
 	const char			*name;
-	u32                             num_rqs;
-	u32                             num_user_rqs;
-	struct drm_sched_rq             **sched_rq;
+	struct drm_sched_rq             *rq;
 	wait_queue_head_t		job_scheduled;
 	atomic64_t			job_id_count;
 	struct workqueue_struct		*submit_wq;
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation
  2025-07-08  9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
                   ` (12 preceding siblings ...)
  2025-07-08  9:51 ` [PATCH v6 13/15] drm/sched: Remove FIFO and RR and simplify to a single run queue Tvrtko Ursulin
@ 2025-07-08  9:51 ` Tvrtko Ursulin
  2025-07-08 12:19   ` Philipp Stanner
  2025-07-08 12:37   ` Christian König
  2025-07-08  9:51 ` [PATCH v6 15/15] drm/sched: Embed run queue singleton into the scheduler Tvrtko Ursulin
  2025-07-08 10:40 ` [PATCH v6 00/15] Fair DRM scheduler Philipp Stanner
  15 siblings, 2 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08  9:51 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Tvrtko Ursulin,
	Christian König, Danilo Krummrich, Matthew Brost,
	Philipp Stanner

There is no reason to queue just a single job if scheduler can take more
and re-queue the worker to queue more. We can simply feed the hardware
with as much as it can take in one go and hopefully win some latency.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/scheduler/sched_internal.h |   2 -
 drivers/gpu/drm/scheduler/sched_main.c     | 132 ++++++++++-----------
 drivers/gpu/drm/scheduler/sched_rq.c       |  12 +-
 3 files changed, 64 insertions(+), 82 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
index 15d78abc48df..1a5c2f255223 100644
--- a/drivers/gpu/drm/scheduler/sched_internal.h
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -22,8 +22,6 @@ struct drm_sched_entity_stats {
 	u64		vruntime;
 };
 
-bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
-			 struct drm_sched_entity *entity);
 void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
 
 void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 35025edea669..1fb3f1da4821 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -95,35 +95,6 @@ static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
 	return credits;
 }
 
-/**
- * drm_sched_can_queue -- Can we queue more to the hardware?
- * @sched: scheduler instance
- * @entity: the scheduler entity
- *
- * Return true if we can push at least one more job from @entity, false
- * otherwise.
- */
-bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
-			 struct drm_sched_entity *entity)
-{
-	struct drm_sched_job *s_job;
-
-	s_job = drm_sched_entity_queue_peek(entity);
-	if (!s_job)
-		return false;
-
-	/* If a job exceeds the credit limit, truncate it to the credit limit
-	 * itself to guarantee forward progress.
-	 */
-	if (s_job->credits > sched->credit_limit) {
-		dev_WARN(sched->dev,
-			 "Jobs may not exceed the credit limit, truncate.\n");
-		s_job->credits = sched->credit_limit;
-	}
-
-	return drm_sched_available_credits(sched) >= s_job->credits;
-}
-
 /**
  * drm_sched_run_job_queue - enqueue run-job work
  * @sched: scheduler instance
@@ -940,54 +911,77 @@ static void drm_sched_run_job_work(struct work_struct *w)
 {
 	struct drm_gpu_scheduler *sched =
 		container_of(w, struct drm_gpu_scheduler, work_run_job);
+	u32 job_credits, submitted_credits = 0;
 	struct drm_sched_entity *entity;
-	struct dma_fence *fence;
 	struct drm_sched_fence *s_fence;
 	struct drm_sched_job *sched_job;
-	int r;
+	struct dma_fence *fence;
 
-	/* Find entity with a ready job */
-	entity = drm_sched_rq_select_entity(sched, sched->rq);
-	if (IS_ERR_OR_NULL(entity))
-		return;	/* No more work */
+	while (!READ_ONCE(sched->pause_submit)) {
+		/* Find entity with a ready job */
+		entity = drm_sched_rq_select_entity(sched, sched->rq);
+		if (!entity)
+			break;	/* No more work */
+
+		sched_job = drm_sched_entity_queue_peek(entity);
+		if (!sched_job) {
+			complete_all(&entity->entity_idle);
+			continue;
+		}
+
+		job_credits = sched_job->credits;
+		/*
+		 * If a job exceeds the credit limit truncate it to guarantee
+		 * forward progress.
+		 */
+		if (dev_WARN_ONCE(sched->dev, job_credits > sched->credit_limit,
+				  "Jobs may not exceed the credit limit, truncating.\n"))
+			job_credits = sched_job->credits = sched->credit_limit;
+
+		if (job_credits > drm_sched_available_credits(sched)) {
+			complete_all(&entity->entity_idle);
+			break;
+		}
+
+		sched_job = drm_sched_entity_pop_job(entity);
+		if (!sched_job) {
+			/* Top entity is not yet runnable after all */
+			complete_all(&entity->entity_idle);
+			continue;
+		}
+
+		s_fence = sched_job->s_fence;
+		drm_sched_job_begin(sched_job);
+		trace_drm_sched_job_run(sched_job, entity);
+		submitted_credits += job_credits;
+		atomic_add(job_credits, &sched->credit_count);
+
+		fence = sched->ops->run_job(sched_job);
+		drm_sched_fence_scheduled(s_fence, fence);
+
+		if (!IS_ERR_OR_NULL(fence)) {
+			int r;
+
+			/* Drop for original kref_init of the fence */
+			dma_fence_put(fence);
+
+			r = dma_fence_add_callback(fence, &sched_job->cb,
+						   drm_sched_job_done_cb);
+			if (r == -ENOENT)
+				drm_sched_job_done(sched_job, fence->error);
+			else if (r)
+				DRM_DEV_ERROR(sched->dev,
+					      "fence add callback failed (%d)\n", r);
+		} else {
+			drm_sched_job_done(sched_job, IS_ERR(fence) ?
+						      PTR_ERR(fence) : 0);
+		}
 
-	sched_job = drm_sched_entity_pop_job(entity);
-	if (!sched_job) {
 		complete_all(&entity->entity_idle);
-		drm_sched_run_job_queue(sched);
-		return;
 	}
 
-	s_fence = sched_job->s_fence;
-
-	atomic_add(sched_job->credits, &sched->credit_count);
-	drm_sched_job_begin(sched_job);
-
-	trace_drm_sched_job_run(sched_job, entity);
-	/*
-	 * The run_job() callback must by definition return a fence whose
-	 * refcount has been incremented for the scheduler already.
-	 */
-	fence = sched->ops->run_job(sched_job);
-	complete_all(&entity->entity_idle);
-	drm_sched_fence_scheduled(s_fence, fence);
-
-	if (!IS_ERR_OR_NULL(fence)) {
-		r = dma_fence_add_callback(fence, &sched_job->cb,
-					   drm_sched_job_done_cb);
-		if (r == -ENOENT)
-			drm_sched_job_done(sched_job, fence->error);
-		else if (r)
-			DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r);
-
-		dma_fence_put(fence);
-	} else {
-		drm_sched_job_done(sched_job, IS_ERR(fence) ?
-				   PTR_ERR(fence) : 0);
-	}
-
-	wake_up(&sched->job_scheduled);
-	drm_sched_run_job_queue(sched);
+	if (submitted_credits)
+		wake_up(&sched->job_scheduled);
 }
 
 static struct workqueue_struct *drm_sched_alloc_wq(const char *name)
diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
index e22f9ff88822..f0afdc0bd417 100644
--- a/drivers/gpu/drm/scheduler/sched_rq.c
+++ b/drivers/gpu/drm/scheduler/sched_rq.c
@@ -197,9 +197,7 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
  *
  * Find oldest waiting ready entity.
  *
- * Return an entity if one is found; return an error-pointer (!NULL) if an
- * entity was ready, but the scheduler had insufficient credits to accommodate
- * its job; return NULL, if no ready entity was found.
+ * Return an entity if one is found or NULL if no ready entity was found.
  */
 struct drm_sched_entity *
 drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
@@ -213,14 +211,6 @@ drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
 
 		entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
 		if (drm_sched_entity_is_ready(entity)) {
-			/* If we can't queue yet, preserve the current entity in
-			 * terms of fairness.
-			 */
-			if (!drm_sched_can_queue(sched, entity)) {
-				spin_unlock(&rq->lock);
-				return ERR_PTR(-ENOSPC);
-			}
-
 			reinit_completion(&entity->entity_idle);
 			break;
 		}
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 15/15] drm/sched: Embed run queue singleton into the scheduler
  2025-07-08  9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
                   ` (13 preceding siblings ...)
  2025-07-08  9:51 ` [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation Tvrtko Ursulin
@ 2025-07-08  9:51 ` Tvrtko Ursulin
  2025-07-08 10:40 ` [PATCH v6 00/15] Fair DRM scheduler Philipp Stanner
  15 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08  9:51 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Tvrtko Ursulin,
	Christian König, Danilo Krummrich, Matthew Brost,
	Philipp Stanner

Now that the run queue to scheduler relationship is always 1:1 we can
embed it (the run queue) directly in the scheduler struct and save on
some allocation error handling code and such.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      |  6 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  6 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  5 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h   |  8 ++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c |  8 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c     |  8 +++---
 drivers/gpu/drm/scheduler/sched_entity.c    | 32 +++++++++------------
 drivers/gpu/drm/scheduler/sched_fence.c     |  2 +-
 drivers/gpu/drm/scheduler/sched_internal.h  |  6 ++--
 drivers/gpu/drm/scheduler/sched_main.c      | 31 ++++----------------
 drivers/gpu/drm/scheduler/sched_rq.c        | 18 ++++++------
 include/drm/gpu_scheduler.h                 |  5 +---
 12 files changed, 58 insertions(+), 77 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index a2adaacf6adb..a7b7db91b2d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1131,7 +1131,8 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p)
 	if (p->gang_size > 1 && !adev->vm_manager.concurrent_flush) {
 		for (i = 0; i < p->gang_size; ++i) {
 			struct drm_sched_entity *entity = p->entities[i];
-			struct drm_gpu_scheduler *sched = entity->rq->sched;
+			struct drm_gpu_scheduler *sched =
+				container_of(entity->rq, typeof(*sched), rq);
 			struct amdgpu_ring *ring = to_amdgpu_ring(sched);
 
 			if (amdgpu_vmid_uses_reserved(vm, ring->vm_hub))
@@ -1259,7 +1260,8 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)
 			return r;
 	}
 
-	sched = p->gang_leader->base.entity->rq->sched;
+	sched = container_of(p->gang_leader->base.entity->rq, typeof(*sched),
+			     rq);
 	while ((fence = amdgpu_sync_get_fence(&p->sync))) {
 		struct drm_sched_fence *s_fence = to_drm_sched_fence(fence);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index c9534cc439b9..9fe5f7ec850d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -338,7 +338,9 @@ static struct dma_fence *
 amdgpu_job_prepare_job(struct drm_sched_job *sched_job,
 		      struct drm_sched_entity *s_entity)
 {
-	struct amdgpu_ring *ring = to_amdgpu_ring(s_entity->rq->sched);
+	struct drm_gpu_scheduler *sched =
+		container_of(s_entity->rq, typeof(*sched), rq);
+	struct amdgpu_ring *ring = to_amdgpu_ring(sched);
 	struct amdgpu_job *job = to_amdgpu_job(sched_job);
 	struct dma_fence *fence;
 	int r;
@@ -438,7 +440,7 @@ drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
 
 void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler *sched)
 {
-	struct drm_sched_rq *rq = sched->rq;
+	struct drm_sched_rq *rq = &sched->rq;
 	struct drm_sched_entity *s_entity;
 	struct drm_sched_job *s_job;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index 2f302266662b..d7415eb32982 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -86,7 +86,10 @@ struct amdgpu_job {
 
 static inline struct amdgpu_ring *amdgpu_job_ring(struct amdgpu_job *job)
 {
-	return to_amdgpu_ring(job->base.entity->rq->sched);
+	struct drm_gpu_scheduler *sched =
+		container_of(job->base.entity->rq, typeof(*sched), rq);
+
+	return to_amdgpu_ring(sched);
 }
 
 int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index d13e64a69e25..85724ec6aaf8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -145,6 +145,7 @@ TRACE_EVENT(amdgpu_cs,
 		     struct amdgpu_ib *ib),
 	    TP_ARGS(p, job, ib),
 	    TP_STRUCT__entry(
+			     __field(struct drm_gpu_scheduler *, sched)
 			     __field(struct amdgpu_bo_list *, bo_list)
 			     __field(u32, ring)
 			     __field(u32, dw)
@@ -152,11 +153,14 @@ TRACE_EVENT(amdgpu_cs,
 			     ),
 
 	    TP_fast_assign(
+			   __entry->sched = container_of(job->base.entity->rq,
+							 typeof(*__entry->sched),
+							 rq);
 			   __entry->bo_list = p->bo_list;
-			   __entry->ring = to_amdgpu_ring(job->base.entity->rq->sched)->idx;
+			   __entry->ring = to_amdgpu_ring(__entry->sched)->idx;
 			   __entry->dw = ib->length_dw;
 			   __entry->fences = amdgpu_fence_count_emitted(
-				to_amdgpu_ring(job->base.entity->rq->sched));
+				to_amdgpu_ring(__entry->sched));
 			   ),
 	    TP_printk("bo_list=%p, ring=%u, dw=%u, fences=%u",
 		      __entry->bo_list, __entry->ring, __entry->dw,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
index 46d9fb433ab2..42f2bfb30af1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
@@ -105,13 +105,13 @@ static int amdgpu_vm_sdma_prepare(struct amdgpu_vm_update_params *p,
 static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
 				 struct dma_fence **fence)
 {
+	struct drm_gpu_scheduler *sched =
+		container_of(p->vm->delayed.rq, typeof(*sched), rq);
+	struct amdgpu_ring *ring =
+		container_of(sched, struct amdgpu_ring, sched);
 	struct amdgpu_ib *ib = p->job->ibs;
-	struct amdgpu_ring *ring;
 	struct dma_fence *f;
 
-	ring = container_of(p->vm->delayed.rq->sched, struct amdgpu_ring,
-			    sched);
-
 	WARN_ON(ib->length_dw == 0);
 	amdgpu_ring_pad_ib(ring, ib);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
index c8fcafeb6864..f0e820e08d1e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
@@ -432,15 +432,15 @@ int amdgpu_xcp_open_device(struct amdgpu_device *adev,
 void amdgpu_xcp_release_sched(struct amdgpu_device *adev,
 				  struct amdgpu_ctx_entity *entity)
 {
-	struct drm_gpu_scheduler *sched;
-	struct amdgpu_ring *ring;
+	struct drm_gpu_scheduler *sched =
+		container_of(entity->entity.rq, typeof(*sched), rq);
 
 	if (!adev->xcp_mgr)
 		return;
 
-	sched = entity->entity.rq->sched;
 	if (drm_sched_wqueue_ready(sched)) {
-		ring = to_amdgpu_ring(entity->entity.rq->sched);
+		struct amdgpu_ring *ring = to_amdgpu_ring(sched);
+
 		atomic_dec(&adev->xcp_mgr->xcp[ring->xcp_id].ref_cnt);
 	}
 }
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 30dba2409ea4..be24cddc811f 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -114,19 +114,12 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 	 * is initialized itself.
 	 */
 	entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
+	if (num_sched_list) {
+		entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
+		entity->rq = &sched_list[0]->rq;
+	}
 	RCU_INIT_POINTER(entity->last_scheduled, NULL);
 	RB_CLEAR_NODE(&entity->rb_tree_node);
-
-	if (num_sched_list && !sched_list[0]->rq) {
-		/* Since every entry covered by num_sched_list
-		 * should be non-NULL and therefore we warn drivers
-		 * not to do this and to fix their DRM calling order.
-		 */
-		pr_warn("%s: called with uninitialized scheduler\n", __func__);
-	} else if (num_sched_list) {
-		entity->rq = sched_list[0]->rq;
-	}
-
 	init_completion(&entity->entity_idle);
 
 	/* We start in an idle state. */
@@ -314,7 +307,7 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
 	if (!entity->rq)
 		return 0;
 
-	sched = entity->rq->sched;
+	sched = container_of(entity->rq, typeof(*sched), rq);
 	/**
 	 * The client will not queue more IBs during this fini, consume existing
 	 * queued IBs or discard them on SIGKILL
@@ -406,9 +399,11 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
 {
 	struct drm_sched_entity *entity =
 		container_of(cb, struct drm_sched_entity, cb);
+	struct drm_gpu_scheduler *sched =
+		container_of(entity->rq, typeof(*sched), rq);
 
 	drm_sched_entity_clear_dep(f, cb);
-	drm_sched_wakeup(entity->rq->sched);
+	drm_sched_wakeup(sched);
 }
 
 /**
@@ -434,7 +429,8 @@ EXPORT_SYMBOL(drm_sched_entity_set_priority);
  */
 static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
 {
-	struct drm_gpu_scheduler *sched = entity->rq->sched;
+	struct drm_gpu_scheduler *sched =
+		container_of(entity->rq, typeof(*sched), rq);
 	struct dma_fence *fence = entity->dependency;
 	struct drm_sched_fence *s_fence;
 
@@ -573,7 +569,7 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
 
 	spin_lock(&entity->lock);
 	sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list);
-	rq = sched ? sched->rq : NULL;
+	rq = sched ? &sched->rq : NULL;
 	if (rq != entity->rq) {
 		drm_sched_rq_remove_entity(entity->rq, entity);
 		entity->rq = rq;
@@ -596,6 +592,8 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
 void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 {
 	struct drm_sched_entity *entity = sched_job->entity;
+	struct drm_gpu_scheduler *sched =
+		container_of(entity->rq, typeof(*sched), rq);
 	bool first;
 
 	trace_drm_sched_job_queue(sched_job, entity);
@@ -607,7 +605,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 		xa_for_each(&sched_job->dependencies, index, entry)
 			trace_drm_sched_job_add_dep(sched_job, entry);
 	}
-	atomic_inc(entity->rq->sched->score);
+	atomic_inc(sched->score);
 	WRITE_ONCE(entity->last_user, current->group_leader);
 
 	/*
@@ -618,8 +616,6 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 
 	/* first job wakes up scheduler */
 	if (first) {
-		struct drm_gpu_scheduler *sched;
-
 		sched = drm_sched_rq_add_entity(entity);
 		if (sched)
 			drm_sched_wakeup(sched);
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
index 9391d6f0dc01..da4f53a9ca35 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -227,7 +227,7 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
 {
 	unsigned seq;
 
-	fence->sched = entity->rq->sched;
+	fence->sched = container_of(entity->rq, typeof(*fence->sched), rq);
 	seq = atomic_inc_return(&entity->fence_seq);
 	dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
 		       &fence->lock, entity->fence_context, seq);
diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
index 1a5c2f255223..3483ed52da1f 100644
--- a/drivers/gpu/drm/scheduler/sched_internal.h
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -24,11 +24,9 @@ struct drm_sched_entity_stats {
 
 void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
 
-void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
-		       struct drm_sched_rq *rq);
+void drm_sched_rq_init(struct drm_gpu_scheduler *sched);
 struct drm_sched_entity *
-drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
-			   struct drm_sched_rq *rq);
+drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched);
 struct drm_gpu_scheduler *
 drm_sched_rq_add_entity(struct drm_sched_entity *entity);
 void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 1fb3f1da4821..b14f98ac91b9 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -578,7 +578,7 @@ void drm_sched_job_arm(struct drm_sched_job *job)
 
 	BUG_ON(!entity);
 	drm_sched_entity_select_rq(entity);
-	sched = entity->rq->sched;
+	sched = container_of(entity->rq, typeof(*sched), rq);
 
 	job->sched = sched;
 	job->s_priority = entity->priority;
@@ -919,7 +919,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
 
 	while (!READ_ONCE(sched->pause_submit)) {
 		/* Find entity with a ready job */
-		entity = drm_sched_rq_select_entity(sched, sched->rq);
+		entity = drm_sched_rq_select_entity(sched);
 		if (!entity)
 			break;	/* No more work */
 
@@ -1022,15 +1022,6 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_
 	sched->score = args->score ? args->score : &sched->_score;
 	sched->dev = args->dev;
 
-	if (sched->rq) {
-		/* Not an error, but warn anyway so drivers can
-		 * fine-tune their DRM calling order, and return all
-		 * is good.
-		 */
-		dev_warn(sched->dev, "%s: scheduler already initialized!\n", __func__);
-		return 0;
-	}
-
 	if (args->submit_wq) {
 		sched->submit_wq = args->submit_wq;
 		sched->own_submit_wq = false;
@@ -1042,11 +1033,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_
 		sched->own_submit_wq = true;
 	}
 
-	sched->rq = kmalloc(sizeof(*sched->rq), GFP_KERNEL | __GFP_ZERO);
-	if (!sched->rq)
-		goto Out_check_own;
-
-	drm_sched_rq_init(sched, sched->rq);
+	drm_sched_rq_init(sched);
 
 	init_waitqueue_head(&sched->job_scheduled);
 	INIT_LIST_HEAD(&sched->pending_list);
@@ -1061,12 +1048,6 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_
 
 	sched->ready = true;
 	return 0;
-
-Out_check_own:
-	if (sched->own_submit_wq)
-		destroy_workqueue(sched->submit_wq);
-	dev_err(sched->dev, "%s: Failed to setup GPU scheduler--out of memory\n", __func__);
-	return -ENOMEM;
 }
 EXPORT_SYMBOL(drm_sched_init);
 
@@ -1094,7 +1075,7 @@ EXPORT_SYMBOL(drm_sched_init);
 void drm_sched_fini(struct drm_gpu_scheduler *sched)
 {
 
-	struct drm_sched_rq *rq = sched->rq;
+	struct drm_sched_rq *rq = &sched->rq;
 	struct drm_sched_entity *s_entity;
 
 	drm_sched_wqueue_stop(sched);
@@ -1118,8 +1099,6 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
 	if (sched->own_submit_wq)
 		destroy_workqueue(sched->submit_wq);
 	sched->ready = false;
-	kfree(sched->rq);
-	sched->rq = NULL;
 }
 EXPORT_SYMBOL(drm_sched_fini);
 
@@ -1136,7 +1115,7 @@ void drm_sched_increase_karma(struct drm_sched_job *bad)
 {
 	struct drm_gpu_scheduler *sched = bad->sched;
 	struct drm_sched_entity *entity, *tmp;
-	struct drm_sched_rq *rq = sched->rq;
+	struct drm_sched_rq *rq = &sched->rq;
 
 	/* don't change @bad's karma if it's from KERNEL RQ,
 	 * because sometimes GPU hang would cause kernel jobs (like VM updating jobs)
diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
index f0afdc0bd417..0d7a433ac2e3 100644
--- a/drivers/gpu/drm/scheduler/sched_rq.c
+++ b/drivers/gpu/drm/scheduler/sched_rq.c
@@ -52,17 +52,16 @@ static void drm_sched_rq_update_tree_locked(struct drm_sched_entity *entity,
  * drm_sched_rq_init - initialize a given run queue struct
  *
  * @sched: scheduler instance to associate with this run queue
- * @rq: scheduler run queue
  *
  * Initializes a scheduler runqueue.
  */
-void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
-		       struct drm_sched_rq *rq)
+void drm_sched_rq_init(struct drm_gpu_scheduler *sched)
 {
+	struct drm_sched_rq *rq = &sched->rq;
+
 	spin_lock_init(&rq->lock);
 	INIT_LIST_HEAD(&rq->entities);
 	rq->rb_tree_root = RB_ROOT_CACHED;
-	rq->sched = sched;
 }
 
 static ktime_t
@@ -109,8 +108,8 @@ drm_sched_rq_add_entity(struct drm_sched_entity *entity)
 	}
 
 	rq = entity->rq;
+	sched = container_of(rq, typeof(*sched), rq);
 	spin_lock(&rq->lock);
-	sched = rq->sched;
 
 	if (list_empty(&entity->list)) {
 		atomic_inc(sched->score);
@@ -138,6 +137,8 @@ drm_sched_rq_add_entity(struct drm_sched_entity *entity)
 void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
 				struct drm_sched_entity *entity)
 {
+	struct drm_gpu_scheduler *sched = container_of(rq, typeof(*sched), rq);
+
 	lockdep_assert_held(&entity->lock);
 
 	if (list_empty(&entity->list))
@@ -145,7 +146,7 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
 
 	spin_lock(&rq->lock);
 
-	atomic_dec(rq->sched->score);
+	atomic_dec(sched->score);
 	list_del_init(&entity->list);
 
 	drm_sched_rq_remove_tree_locked(entity, rq);
@@ -193,16 +194,15 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
  * drm_sched_rq_select_entity - Select an entity which provides a job to run
  *
  * @sched: the gpu scheduler
- * @rq: scheduler run queue to check.
  *
  * Find oldest waiting ready entity.
  *
  * Return an entity if one is found or NULL if no ready entity was found.
  */
 struct drm_sched_entity *
-drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
-			   struct drm_sched_rq *rq)
+drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched)
 {
+	struct drm_sched_rq *rq = &sched->rq;
 	struct rb_node *rb;
 
 	spin_lock(&rq->lock);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 6947a577d249..c51ac2ab8c30 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -245,7 +245,6 @@ struct drm_sched_entity {
 /**
  * struct drm_sched_rq - queue of entities to be scheduled.
  *
- * @sched: the scheduler to which this rq belongs to.
  * @lock: protects @entities, @rb_tree_root and @rr_deadline.
  * @rr_deadline: monotonically incrementing fake timestamp for RR mode
  * @entities: list of the entities to be scheduled.
@@ -256,8 +255,6 @@ struct drm_sched_entity {
  * the next entity to emit commands from.
  */
 struct drm_sched_rq {
-	struct drm_gpu_scheduler	*sched;
-
 	spinlock_t			lock;
 	/* Following members are protected by the @lock: */
 	ktime_t				rr_deadline;
@@ -560,7 +557,7 @@ struct drm_gpu_scheduler {
 	atomic_t			credit_count;
 	long				timeout;
 	const char			*name;
-	struct drm_sched_rq             *rq;
+	struct drm_sched_rq             rq;
 	wait_queue_head_t		job_scheduled;
 	atomic64_t			job_id_count;
 	struct workqueue_struct		*submit_wq;
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 00/15] Fair DRM scheduler
  2025-07-08  9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
                   ` (14 preceding siblings ...)
  2025-07-08  9:51 ` [PATCH v6 15/15] drm/sched: Embed run queue singleton into the scheduler Tvrtko Ursulin
@ 2025-07-08 10:40 ` Philipp Stanner
  15 siblings, 0 replies; 31+ messages in thread
From: Philipp Stanner @ 2025-07-08 10:40 UTC (permalink / raw)
  To: Tvrtko Ursulin, dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Christian König,
	Danilo Krummrich, Leo Liu, Matthew Brost, Philipp Stanner,
	Pierre-Eric Pelloux-Prayer, Michel Dänzer

You keep forgetting the "RFC" in your recent series's. Please add it
next time.

P.

On Tue, 2025-07-08 at 10:51 +0100, Tvrtko Ursulin wrote:
> As a summary, the new scheduling algorithm is insipired by the
> original Linux
> CFS and so far no scheduling regressions have been found. There are
> improvements
> in fairness and scheduling of interactive clients when running in
> parallel with
> a heavy GPU load (for example Pierre-Eric has one viewperf medical
> test which
> shows a nice improvement with amdgpu).
> 
> On the high level main advantages of the series are:
> 
>  1. Scheduling quality - schedules better than FIFO, solves priority
> starvation.
>  2. Code simplification - no more multiple run queues and multiple
> algorithms.
>  3. Virtual GPU time based scheduling enables relatively simple
> addition
>     of a scheduling cgroup controller in the future.
> 
> There is a little bit more detailed write up on the motivation and
> results in
> the form of a blog post which may be easier to read:
> https://blogs.igalia.com/tursulin/fair-er-drm-gpu-scheduler/
> 
> First patches add some unit tests which allow for easy evaluation of
> scheduling
> behaviour against different client submission patterns. From there
> onwards it is
> hopefully a natural progression of cleanups, enablers, adding the
> fair policy,
> and finally removing FIFO and RR and simplifying the code base due no
> more need
> for multiple run queues.
> 
> As a headline result I have tested three simultaneous clients on the
> Steam Deck:
> 
> One instance of a deferredmultisampling Vulkan demo running with low
> priority,
> one normal priority instance of the same demo, and the Unigine Heaven
> benchmark.
> 
> With the FIFO scheduler we can see that the low priority client is
> completely
> starved and the GPU time distribution between the other two clients
> is uneven:
> 
> https://people.igalia.com/tursulin/drm-sched-fair/fifo-starvation.png
> 
> Switching to the fair scheduler, GPU time distribution is almost
> equal and the
> low priority client does get a small share of the GPU:
> 
> https://people.igalia.com/tursulin/drm-sched-fair/fair-no-starvation.png
> 
> Moving onto the synthetic submission patterns, they are about two
> simultaneous
> clients which broadly cover the following categories:
> 
>  * Deep queue clients
>  * Hogs versus interactive
>  * Priority handling
> 
> Lets look at the results:
> 
> 1. Two normal priority deep queue clients.
> 
> These ones submit one second worth of 8ms jobs. As fast as they can,
> no
> dependencies etc. There is no difference in runtime between FIFO and
> fair but
> the latter allows both clients to progress with work more evenly:
> 
> https://people.igalia.com/tursulin/drm-sched-fair/normal-normal.png
> 
> (X axis is time, Y is submitted queue-depth, hence lowering of qd
> corresponds
>   with work progress for both clients, tested with both schedulers
> separately.)
> 
> 2. Same two clients but one is now low priority.
> 
> https://people.igalia.com/tursulin/drm-sched-fair/normal-low.png
> 
> Normal priority client is a solid line, low priority dotted. We can
> see how FIFO
> completely starves the low priority client until the normal priority
> is fully
> done. Only then the low priority client gets any GPU time.
> 
> In constrast, fair scheduler allows some GPU time to the low priority
> client.
> 
> 3. Same clients but now high versus normal priority.
> 
> Similar behaviour as in the previous one with normal a bit less de-
> prioritised
> relative to high, than low was against normal.
> 
> https://people.igalia.com/tursulin/drm-sched-fair/high-normal.png
> 
> 4. Heavy load vs interactive client.
> 
> Heavy client emits a 75% GPU load in the format of 3x 2.5ms jobs
> followed by a
> 2.5ms wait. Interactive client emits a 10% GPU load in the format of
> 1x 1ms job
> followed by a 9ms wait.
> 
> This simulates an interactive graphical client used on top of a
> relatively heavy
> background load but no GPU oversubscription.
> 
> Graphs show the interactive client only and from now on, instead of
> looking at
> the client's queue depth, we look at its "fps".
> 
> https://people.igalia.com/tursulin/drm-sched-fair/heavy-interactive.png
> 
> We can see that fair scheduler allows a higher fps for the
> interactive client
> which is good.
> 
> 5. An even heavier load vs interactive client.
> 
> This one is oversubscribing the GPU by submitting 4x 50ms jobs and
> waiting for
> only one microsecond before repeating the cycle. Interactive client
> is thje same
> 10% as above.
> 
> https://people.igalia.com/tursulin/drm-sched-fair/veryheavy-interactive.png
> 
> Here the difference is even more dramatic with fair scheduler
> enabling ~3x the
> framerate for the interactive client.
> 
> 6. Low priority GPU hog versus heavy-interactive.
> 
> Low priority client: 3x 2.5ms jobs client followed by a 0.5ms wait.
> Interactive client: 1x 0.5ms job followed by a 10ms wait.
> 
> https://people.igalia.com/tursulin/drm-sched-fair/lowhog-interactive.png
> 
> Slight win for the fair scheduler but could be just noise.
> 
> 7. Last set of test scenarios will have three subgroups.
> 
> In all cases we have two interactive (synchronous, single job at a
> time) clients
> with a 50% "duty cycle" GPU time usage.
> 
> Client 1: 1.5ms job + 1.5ms wait (aka short bursty)
> Client 2: 2.5ms job + 2.5ms wait (aka long bursty)
> 
> a) Both normal priority.
> 
> https://people.igalia.com/tursulin/drm-sched-fair/5050-short.png
> https://people.igalia.com/tursulin/drm-sched-fair/5050-long.png
> 
> Both schedulers favour the higher frequency duty cycle with fair
> giving it a
> little bit more which should be good for interactivity.
> 
> b) Normal vs low priority.
> 
> https://people.igalia.com/tursulin/drm-sched-fair/5050-normal-low-normal.png
> https://people.igalia.com/tursulin/drm-sched-fair/5050-normal-low-low.png
> 
> Fair scheduler gives a bit more GPU time to the normal priority
> client which is
> again good.
> 
> c) High vs normal priority.
> 
> https://people.igalia.com/tursulin/drm-sched-fair/5050-high-normal-high.png
> https://people.igalia.com/tursulin/drm-sched-fair/5050-high-normal-normal.png
> 
> Again, fair scheduler gives a bit more share to the higher priority
> client.
> 
> As before, I am looking for feedback, ideas for what other kinds of
> submission
> scenarios to test, testing on different GPUs and of course reviews.
> 
> v2:
>  * Fixed many rebase errors.
>  * Added some new patches.
>  * Dropped single shot dependecy handling.
> 
> v3:
>  * Added scheduling quality unit tests.
>  * Refined a tiny bit by adding some fairness.
>  * Dropped a few patches for now.
> 
> v4:
>  * Replaced deadline with fair!
>  * Refined scheduling quality unit tests.
>  * Pulled one cleanup patch earlier.
>  * Fixed "drm/sched: Avoid double re-lock on the job free path".
> 
> v5:
>  * Rebase on top of latest upstream DRM scheduler changes.
>  * Kerneldoc fixup.
>  * Improve commit message justification for one patch. (Philipp)
>  * Add comment in drm_sched_alloc_wq. (Christian)
> 
> v6:
>  * Rebase for "drm/sched: De-clutter drm_sched_init" getting merged.
>  * Avoid NULL rq dereference from a bad rebase. (Maira)
>  * Added some kerneldoc throughout. (Maira)
>  * Removed some lockdep annotations not belonging to one patch.
> (Maira)
>  * Use dma_fence_is_signaled in "drm/sched: Avoid double re-lock on
> the job free path". (Maira, Philipp)
> 
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> CC: Leo Liu <Leo.Liu@amd.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> Cc: Michel Dänzer <michel.daenzer@mailbox.org>
> 
> Tvrtko Ursulin (15):
>   drm/sched: Add some scheduling quality unit tests
>   drm/sched: Add some more scheduling quality unit tests
>   drm/sched: Avoid double re-lock on the job free path
>   drm/sched: Consolidate drm_sched_job_timedout
>   drm/sched: Consolidate drm_sched_rq_select_entity_rr
>   drm/sched: Implement RR via FIFO
>   drm/sched: Consolidate entity run queue management
>   drm/sched: Move run queue related code into a separate file
>   drm/sched: Free all finished jobs at once
>   drm/sched: Account entity GPU time
>   drm/sched: Remove idle entity from tree
>   drm/sched: Add fair scheduling policy
>   drm/sched: Remove FIFO and RR and simplify to a single run queue
>   drm/sched: Queue all free credits in one worker invocation
>   drm/sched: Embed run queue singleton into the scheduler
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |   6 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c       |  27 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h       |   5 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h     |   8 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c   |   8 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c       |   8 +-
>  drivers/gpu/drm/scheduler/Makefile            |   2 +-
>  drivers/gpu/drm/scheduler/sched_entity.c      | 131 ++-
>  drivers/gpu/drm/scheduler/sched_fence.c       |   2 +-
>  drivers/gpu/drm/scheduler/sched_internal.h    | 146 +++-
>  drivers/gpu/drm/scheduler/sched_main.c        | 548 +++---------
>  drivers/gpu/drm/scheduler/sched_rq.c          | 221 +++++
>  drivers/gpu/drm/scheduler/tests/Makefile      |   3 +-
>  .../gpu/drm/scheduler/tests/tests_scheduler.c | 815
> ++++++++++++++++++
>  include/drm/gpu_scheduler.h                   |  31 +-
>  15 files changed, 1396 insertions(+), 565 deletions(-)
>  create mode 100644 drivers/gpu/drm/scheduler/sched_rq.c
>  create mode 100644 drivers/gpu/drm/scheduler/tests/tests_scheduler.c
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 03/15] drm/sched: Avoid double re-lock on the job free path
  2025-07-08  9:51 ` [PATCH v6 03/15] drm/sched: Avoid double re-lock on the job free path Tvrtko Ursulin
@ 2025-07-08 11:22   ` Philipp Stanner
  2025-07-08 12:23     ` Tvrtko Ursulin
  0 siblings, 1 reply; 31+ messages in thread
From: Philipp Stanner @ 2025-07-08 11:22 UTC (permalink / raw)
  To: Tvrtko Ursulin, dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Christian König,
	Danilo Krummrich, Matthew Brost, Philipp Stanner

On Tue, 2025-07-08 at 10:51 +0100, Tvrtko Ursulin wrote:
> Currently the job free work item will lock sched->job_list_lock first
> time
> to see if there are any jobs, free a single job, and then lock again
> to
> decide whether to re-queue itself if there are more finished jobs.
> 
> Since drm_sched_get_finished_job() already looks at the second job in
> the
> queue we can simply add the signaled check and have it return the
> presence
> of more jobs to free to the caller. That way the work item does not
> have
> to lock the list again and repeat the signaled check.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>

This one can be sent separately, like the one for drm_sched_init()
recently, can't it?

P.

> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 37 ++++++++++--------------
> --
>  1 file changed, 14 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 1f077782ec12..1bce0b66f89c 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -366,22 +366,6 @@ static void __drm_sched_run_free_queue(struct
> drm_gpu_scheduler *sched)
>  		queue_work(sched->submit_wq, &sched->work_free_job);
>  }
>  
> -/**
> - * drm_sched_run_free_queue - enqueue free-job work if ready
> - * @sched: scheduler instance
> - */
> -static void drm_sched_run_free_queue(struct drm_gpu_scheduler
> *sched)
> -{
> -	struct drm_sched_job *job;
> -
> -	spin_lock(&sched->job_list_lock);
> -	job = list_first_entry_or_null(&sched->pending_list,
> -				       struct drm_sched_job, list);
> -	if (job && dma_fence_is_signaled(&job->s_fence->finished))
> -		__drm_sched_run_free_queue(sched);
> -	spin_unlock(&sched->job_list_lock);
> -}
> -
>  /**
>   * drm_sched_job_done - complete a job
>   * @s_job: pointer to the job which is done
> @@ -1102,12 +1086,13 @@ drm_sched_select_entity(struct
> drm_gpu_scheduler *sched)
>   * drm_sched_get_finished_job - fetch the next finished job to be
> destroyed
>   *
>   * @sched: scheduler instance
> + * @have_more: are there more finished jobs on the list
>   *
>   * Returns the next finished job from the pending list (if there is
> one)
>   * ready for it to be destroyed.
>   */
>  static struct drm_sched_job *
> -drm_sched_get_finished_job(struct drm_gpu_scheduler *sched)
> +drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool
> *have_more)
>  {
>  	struct drm_sched_job *job, *next;
>  
> @@ -1115,22 +1100,25 @@ drm_sched_get_finished_job(struct
> drm_gpu_scheduler *sched)
>  
>  	job = list_first_entry_or_null(&sched->pending_list,
>  				       struct drm_sched_job, list);
> -
>  	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>  		/* remove job from pending_list */
>  		list_del_init(&job->list);
>  
>  		/* cancel this job's TO timer */
>  		cancel_delayed_work(&sched->work_tdr);
> -		/* make the scheduled timestamp more accurate */
> +
> +		*have_more = false;
>  		next = list_first_entry_or_null(&sched-
> >pending_list,
>  						typeof(*next),
> list);
> -
>  		if (next) {
> +			/* make the scheduled timestamp more
> accurate */
>  			if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
>  				     &next->s_fence-
> >scheduled.flags))
>  				next->s_fence->scheduled.timestamp =
>  					dma_fence_timestamp(&job-
> >s_fence->finished);
> +
> +			*have_more = dma_fence_is_signaled(&next-
> >s_fence->finished);
> +
>  			/* start TO timer for next job */
>  			drm_sched_start_timeout(sched);
>  		}
> @@ -1189,12 +1177,15 @@ static void drm_sched_free_job_work(struct
> work_struct *w)
>  	struct drm_gpu_scheduler *sched =
>  		container_of(w, struct drm_gpu_scheduler,
> work_free_job);
>  	struct drm_sched_job *job;
> +	bool have_more;
>  
> -	job = drm_sched_get_finished_job(sched);
> -	if (job)
> +	job = drm_sched_get_finished_job(sched, &have_more);
> +	if (job) {
>  		sched->ops->free_job(job);
> +		if (have_more)
> +			__drm_sched_run_free_queue(sched);
> +	}
>  
> -	drm_sched_run_free_queue(sched);
>  	drm_sched_run_job_queue(sched);
>  }
>  


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 05/15] drm/sched: Consolidate drm_sched_rq_select_entity_rr
  2025-07-08  9:51 ` [PATCH v6 05/15] drm/sched: Consolidate drm_sched_rq_select_entity_rr Tvrtko Ursulin
@ 2025-07-08 11:31   ` Philipp Stanner
  2025-07-08 12:21     ` Danilo Krummrich
  2025-07-08 12:23     ` Tvrtko Ursulin
  0 siblings, 2 replies; 31+ messages in thread
From: Philipp Stanner @ 2025-07-08 11:31 UTC (permalink / raw)
  To: Tvrtko Ursulin, dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Christian König,
	Danilo Krummrich, Matthew Brost, Philipp Stanner

On Tue, 2025-07-08 at 10:51 +0100, Tvrtko Ursulin wrote:
> Extract out two copies of the identical code to function epilogue to
> make
> it smaller and more readable.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>

Can also be branched out.

P.

> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 48 +++++++++++-------------
> --
>  1 file changed, 20 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 2335df4de2b8..9212ebae7476 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -263,38 +263,14 @@ drm_sched_rq_select_entity_rr(struct
> drm_gpu_scheduler *sched,
>  	entity = rq->current_entity;
>  	if (entity) {
>  		list_for_each_entry_continue(entity, &rq->entities,
> list) {
> -			if (drm_sched_entity_is_ready(entity)) {
> -				/* If we can't queue yet, preserve
> the current
> -				 * entity in terms of fairness.
> -				 */
> -				if (!drm_sched_can_queue(sched,
> entity)) {
> -					spin_unlock(&rq->lock);
> -					return ERR_PTR(-ENOSPC);
> -				}
> -
> -				rq->current_entity = entity;
> -				reinit_completion(&entity-
> >entity_idle);
> -				spin_unlock(&rq->lock);
> -				return entity;
> -			}
> +			if (drm_sched_entity_is_ready(entity))
> +				goto found;
>  		}
>  	}
>  
>  	list_for_each_entry(entity, &rq->entities, list) {
> -		if (drm_sched_entity_is_ready(entity)) {
> -			/* If we can't queue yet, preserve the
> current entity in
> -			 * terms of fairness.
> -			 */
> -			if (!drm_sched_can_queue(sched, entity)) {
> -				spin_unlock(&rq->lock);
> -				return ERR_PTR(-ENOSPC);
> -			}
> -
> -			rq->current_entity = entity;
> -			reinit_completion(&entity->entity_idle);
> -			spin_unlock(&rq->lock);
> -			return entity;
> -		}
> +		if (drm_sched_entity_is_ready(entity))
> +			goto found;
>  
>  		if (entity == rq->current_entity)
>  			break;
> @@ -303,6 +279,22 @@ drm_sched_rq_select_entity_rr(struct
> drm_gpu_scheduler *sched,
>  	spin_unlock(&rq->lock);
>  
>  	return NULL;
> +
> +found:
> +	if (!drm_sched_can_queue(sched, entity)) {
> +		/*
> +		 * If scheduler cannot take more jobs signal the
> caller to not
> +		 * consider lower priority queues.
> +		 */
> +		entity = ERR_PTR(-ENOSPC);
> +	} else {
> +		rq->current_entity = entity;
> +		reinit_completion(&entity->entity_idle);
> +	}
> +
> +	spin_unlock(&rq->lock);
> +
> +	return entity;
>  }
>  
>  /**


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation
  2025-07-08  9:51 ` [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation Tvrtko Ursulin
@ 2025-07-08 12:19   ` Philipp Stanner
  2025-07-08 12:28     ` Tvrtko Ursulin
  2025-07-08 12:37   ` Christian König
  1 sibling, 1 reply; 31+ messages in thread
From: Philipp Stanner @ 2025-07-08 12:19 UTC (permalink / raw)
  To: Tvrtko Ursulin, dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Christian König,
	Danilo Krummrich, Matthew Brost, Philipp Stanner

On Tue, 2025-07-08 at 10:51 +0100, Tvrtko Ursulin wrote:
> There is no reason to queue just a single job if scheduler can take
> more
> and re-queue the worker to queue more. We can simply feed the
> hardware
> with as much as it can take in one go and hopefully win some latency.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> ---
>  drivers/gpu/drm/scheduler/sched_internal.h |   2 -
>  drivers/gpu/drm/scheduler/sched_main.c     | 132 ++++++++++---------
> --
>  drivers/gpu/drm/scheduler/sched_rq.c       |  12 +-
>  3 files changed, 64 insertions(+), 82 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_internal.h
> b/drivers/gpu/drm/scheduler/sched_internal.h
> index 15d78abc48df..1a5c2f255223 100644
> --- a/drivers/gpu/drm/scheduler/sched_internal.h
> +++ b/drivers/gpu/drm/scheduler/sched_internal.h
> @@ -22,8 +22,6 @@ struct drm_sched_entity_stats {
>  	u64		vruntime;
>  };
>  
> -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
> -			 struct drm_sched_entity *entity);
>  void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>  
>  void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 35025edea669..1fb3f1da4821 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -95,35 +95,6 @@ static u32 drm_sched_available_credits(struct
> drm_gpu_scheduler *sched)
>  	return credits;
>  }
>  
> -/**
> - * drm_sched_can_queue -- Can we queue more to the hardware?
> - * @sched: scheduler instance
> - * @entity: the scheduler entity
> - *
> - * Return true if we can push at least one more job from @entity,
> false
> - * otherwise.
> - */
> -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
> -			 struct drm_sched_entity *entity)
> -{
> -	struct drm_sched_job *s_job;
> -
> -	s_job = drm_sched_entity_queue_peek(entity);
> -	if (!s_job)
> -		return false;
> -
> -	/* If a job exceeds the credit limit, truncate it to the
> credit limit
> -	 * itself to guarantee forward progress.
> -	 */
> -	if (s_job->credits > sched->credit_limit) {
> -		dev_WARN(sched->dev,
> -			 "Jobs may not exceed the credit limit,
> truncate.\n");
> -		s_job->credits = sched->credit_limit;
> -	}
> -
> -	return drm_sched_available_credits(sched) >= s_job->credits;
> -}
> -
>  /**
>   * drm_sched_run_job_queue - enqueue run-job work
>   * @sched: scheduler instance
> @@ -940,54 +911,77 @@ static void drm_sched_run_job_work(struct
> work_struct *w)
>  {
>  	struct drm_gpu_scheduler *sched =
>  		container_of(w, struct drm_gpu_scheduler,
> work_run_job);
> +	u32 job_credits, submitted_credits = 0;
>  	struct drm_sched_entity *entity;
> -	struct dma_fence *fence;
>  	struct drm_sched_fence *s_fence;
>  	struct drm_sched_job *sched_job;
> -	int r;
> +	struct dma_fence *fence;
>  
> -	/* Find entity with a ready job */
> -	entity = drm_sched_rq_select_entity(sched, sched->rq);
> -	if (IS_ERR_OR_NULL(entity))
> -		return;	/* No more work */
> +	while (!READ_ONCE(sched->pause_submit)) {

I had it on my agenda whether we can remove these atomic booleans once
and for all. If we can't, we probably want to document why they are
there.

> +		/* Find entity with a ready job */
> +		entity = drm_sched_rq_select_entity(sched, sched-
> >rq);
> +		if (!entity)
> +			break;	/* No more work */
> +
> +		sched_job = drm_sched_entity_queue_peek(entity);
> +		if (!sched_job) {
> +			complete_all(&entity->entity_idle);
> +			continue;
> +		}
> +
> +		job_credits = sched_job->credits;
> +		/*
> +		 * If a job exceeds the credit limit truncate it to
> guarantee
> +		 * forward progress.
> +		 */
> +		if (dev_WARN_ONCE(sched->dev, job_credits > sched-
> >credit_limit,
> +				  "Jobs may not exceed the credit
> limit, truncating.\n"))
> +			job_credits = sched_job->credits = sched-
> >credit_limit;
> +
> +		if (job_credits >
> drm_sched_available_credits(sched)) {
> +			complete_all(&entity->entity_idle);
> +			break;
> +		}
> +
> +		sched_job = drm_sched_entity_pop_job(entity);
> +		if (!sched_job) {
> +			/* Top entity is not yet runnable after all
> */
> +			complete_all(&entity->entity_idle);
> +			continue;
> +		}
> +
> +		s_fence = sched_job->s_fence;
> +		drm_sched_job_begin(sched_job);
> +		trace_drm_sched_job_run(sched_job, entity);
> +		submitted_credits += job_credits;
> +		atomic_add(job_credits, &sched->credit_count);
> +
> +		fence = sched->ops->run_job(sched_job);
> +		drm_sched_fence_scheduled(s_fence, fence);
> +
> +		if (!IS_ERR_OR_NULL(fence)) {
> +			int r;
> +
> +			/* Drop for original kref_init of the fence
> */
> +			dma_fence_put(fence);

Not sure if you're aware, but this is based on quite some outdated
code. In mainline this function has been reworked a while ago.


P.

> +
> +			r = dma_fence_add_callback(fence,
> &sched_job->cb,
> +						  
> drm_sched_job_done_cb);
> +			if (r == -ENOENT)
> +				drm_sched_job_done(sched_job, fence-
> >error);
> +			else if (r)
> +				DRM_DEV_ERROR(sched->dev,
> +					      "fence add callback
> failed (%d)\n", r);
> +		} else {
> +			drm_sched_job_done(sched_job, IS_ERR(fence)
> ?
> +						      PTR_ERR(fence)
> : 0);
> +		}
>  
> -	sched_job = drm_sched_entity_pop_job(entity);
> -	if (!sched_job) {
>  		complete_all(&entity->entity_idle);
> -		drm_sched_run_job_queue(sched);
> -		return;
>  	}
>  
> -	s_fence = sched_job->s_fence;
> -
> -	atomic_add(sched_job->credits, &sched->credit_count);
> -	drm_sched_job_begin(sched_job);
> -
> -	trace_drm_sched_job_run(sched_job, entity);
> -	/*
> -	 * The run_job() callback must by definition return a fence
> whose
> -	 * refcount has been incremented for the scheduler already.
> -	 */
> -	fence = sched->ops->run_job(sched_job);
> -	complete_all(&entity->entity_idle);
> -	drm_sched_fence_scheduled(s_fence, fence);
> -
> -	if (!IS_ERR_OR_NULL(fence)) {
> -		r = dma_fence_add_callback(fence, &sched_job->cb,
> -					   drm_sched_job_done_cb);
> -		if (r == -ENOENT)
> -			drm_sched_job_done(sched_job, fence->error);
> -		else if (r)
> -			DRM_DEV_ERROR(sched->dev, "fence add
> callback failed (%d)\n", r);
> -
> -		dma_fence_put(fence);
> -	} else {
> -		drm_sched_job_done(sched_job, IS_ERR(fence) ?
> -				   PTR_ERR(fence) : 0);
> -	}
> -
> -	wake_up(&sched->job_scheduled);
> -	drm_sched_run_job_queue(sched);
> +	if (submitted_credits)
> +		wake_up(&sched->job_scheduled);
>  }
>  
>  static struct workqueue_struct *drm_sched_alloc_wq(const char *name)
> diff --git a/drivers/gpu/drm/scheduler/sched_rq.c
> b/drivers/gpu/drm/scheduler/sched_rq.c
> index e22f9ff88822..f0afdc0bd417 100644
> --- a/drivers/gpu/drm/scheduler/sched_rq.c
> +++ b/drivers/gpu/drm/scheduler/sched_rq.c
> @@ -197,9 +197,7 @@ void drm_sched_rq_pop_entity(struct
> drm_sched_entity *entity)
>   *
>   * Find oldest waiting ready entity.
>   *
> - * Return an entity if one is found; return an error-pointer (!NULL)
> if an
> - * entity was ready, but the scheduler had insufficient credits to
> accommodate
> - * its job; return NULL, if no ready entity was found.
> + * Return an entity if one is found or NULL if no ready entity was
> found.
>   */
>  struct drm_sched_entity *
>  drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
> @@ -213,14 +211,6 @@ drm_sched_rq_select_entity(struct
> drm_gpu_scheduler *sched,
>  
>  		entity = rb_entry(rb, struct drm_sched_entity,
> rb_tree_node);
>  		if (drm_sched_entity_is_ready(entity)) {
> -			/* If we can't queue yet, preserve the
> current entity in
> -			 * terms of fairness.
> -			 */
> -			if (!drm_sched_can_queue(sched, entity)) {
> -				spin_unlock(&rq->lock);
> -				return ERR_PTR(-ENOSPC);
> -			}
> -
>  			reinit_completion(&entity->entity_idle);
>  			break;
>  		}


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 05/15] drm/sched: Consolidate drm_sched_rq_select_entity_rr
  2025-07-08 11:31   ` Philipp Stanner
@ 2025-07-08 12:21     ` Danilo Krummrich
  2025-07-08 12:23     ` Tvrtko Ursulin
  1 sibling, 0 replies; 31+ messages in thread
From: Danilo Krummrich @ 2025-07-08 12:21 UTC (permalink / raw)
  To: Philipp Stanner
  Cc: phasta, Tvrtko Ursulin, dri-devel, intel-xe, amd-gfx, kernel-dev,
	Christian König, Matthew Brost

On Tue Jul 8, 2025 at 1:31 PM CEST, Philipp Stanner wrote:
> On Tue, 2025-07-08 at 10:51 +0100, Tvrtko Ursulin wrote:
>> Extract out two copies of the identical code to function epilogue to
>> make
>> it smaller and more readable.
>> 
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Danilo Krummrich <dakr@kernel.org>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Philipp Stanner <phasta@kernel.org>
>
> Can also be branched out.

There seems to be something going on with your mail client. I saw it a couple of
times now that it somehow messes up the original code in your replies, such as
below.

>> ---
>>  drivers/gpu/drm/scheduler/sched_main.c | 48 +++++++++++-------------
>> --
>>  1 file changed, 20 insertions(+), 28 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 2335df4de2b8..9212ebae7476 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -263,38 +263,14 @@ drm_sched_rq_select_entity_rr(struct
>> drm_gpu_scheduler *sched,
>>  	entity = rq->current_entity;
>>  	if (entity) {
>>  		list_for_each_entry_continue(entity, &rq->entities,
>> list) {
>> -			if (drm_sched_entity_is_ready(entity)) {
>> -				/* If we can't queue yet, preserve
>> the current
>> -				 * entity in terms of fairness.
>> -				 */
>> -				if (!drm_sched_can_queue(sched,
>> entity)) {
>> -					spin_unlock(&rq->lock);
>> -					return ERR_PTR(-ENOSPC);
>> -				}
>> -
>> -				rq->current_entity = entity;
>> -				reinit_completion(&entity-
>> >entity_idle);
>> -				spin_unlock(&rq->lock);
>> -				return entity;
>> -			}
>> +			if (drm_sched_entity_is_ready(entity))
>> +				goto found;
>>  		}
>>  	}
>>  
>>  	list_for_each_entry(entity, &rq->entities, list) {
>> -		if (drm_sched_entity_is_ready(entity)) {
>> -			/* If we can't queue yet, preserve the
>> current entity in
>> -			 * terms of fairness.
>> -			 */
>> -			if (!drm_sched_can_queue(sched, entity)) {
>> -				spin_unlock(&rq->lock);
>> -				return ERR_PTR(-ENOSPC);
>> -			}
>> -
>> -			rq->current_entity = entity;
>> -			reinit_completion(&entity->entity_idle);
>> -			spin_unlock(&rq->lock);
>> -			return entity;
>> -		}
>> +		if (drm_sched_entity_is_ready(entity))
>> +			goto found;
>>  
>>  		if (entity == rq->current_entity)
>>  			break;
>> @@ -303,6 +279,22 @@ drm_sched_rq_select_entity_rr(struct
>> drm_gpu_scheduler *sched,
>>  	spin_unlock(&rq->lock);
>>  
>>  	return NULL;
>> +
>> +found:
>> +	if (!drm_sched_can_queue(sched, entity)) {
>> +		/*
>> +		 * If scheduler cannot take more jobs signal the
>> caller to not
>> +		 * consider lower priority queues.
>> +		 */
>> +		entity = ERR_PTR(-ENOSPC);
>> +	} else {
>> +		rq->current_entity = entity;
>> +		reinit_completion(&entity->entity_idle);
>> +	}
>> +
>> +	spin_unlock(&rq->lock);
>> +
>> +	return entity;
>>  }
>>  
>>  /**


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 03/15] drm/sched: Avoid double re-lock on the job free path
  2025-07-08 11:22   ` Philipp Stanner
@ 2025-07-08 12:23     ` Tvrtko Ursulin
  0 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08 12:23 UTC (permalink / raw)
  To: phasta, dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Christian König,
	Danilo Krummrich, Matthew Brost


On 08/07/2025 12:22, Philipp Stanner wrote:
> On Tue, 2025-07-08 at 10:51 +0100, Tvrtko Ursulin wrote:
>> Currently the job free work item will lock sched->job_list_lock first
>> time
>> to see if there are any jobs, free a single job, and then lock again
>> to
>> decide whether to re-queue itself if there are more finished jobs.
>>
>> Since drm_sched_get_finished_job() already looks at the second job in
>> the
>> queue we can simply add the signaled check and have it return the
>> presence
>> of more jobs to free to the caller. That way the work item does not
>> have
>> to lock the list again and repeat the signaled check.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Danilo Krummrich <dakr@kernel.org>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Philipp Stanner <phasta@kernel.org>
> 
> This one can be sent separately, like the one for drm_sched_init()
> recently, can't it?

Done.

Regards,

Tvrtko

> P.
> 
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 37 ++++++++++--------------
>> --
>>   1 file changed, 14 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 1f077782ec12..1bce0b66f89c 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -366,22 +366,6 @@ static void __drm_sched_run_free_queue(struct
>> drm_gpu_scheduler *sched)
>>   		queue_work(sched->submit_wq, &sched->work_free_job);
>>   }
>>   
>> -/**
>> - * drm_sched_run_free_queue - enqueue free-job work if ready
>> - * @sched: scheduler instance
>> - */
>> -static void drm_sched_run_free_queue(struct drm_gpu_scheduler
>> *sched)
>> -{
>> -	struct drm_sched_job *job;
>> -
>> -	spin_lock(&sched->job_list_lock);
>> -	job = list_first_entry_or_null(&sched->pending_list,
>> -				       struct drm_sched_job, list);
>> -	if (job && dma_fence_is_signaled(&job->s_fence->finished))
>> -		__drm_sched_run_free_queue(sched);
>> -	spin_unlock(&sched->job_list_lock);
>> -}
>> -
>>   /**
>>    * drm_sched_job_done - complete a job
>>    * @s_job: pointer to the job which is done
>> @@ -1102,12 +1086,13 @@ drm_sched_select_entity(struct
>> drm_gpu_scheduler *sched)
>>    * drm_sched_get_finished_job - fetch the next finished job to be
>> destroyed
>>    *
>>    * @sched: scheduler instance
>> + * @have_more: are there more finished jobs on the list
>>    *
>>    * Returns the next finished job from the pending list (if there is
>> one)
>>    * ready for it to be destroyed.
>>    */
>>   static struct drm_sched_job *
>> -drm_sched_get_finished_job(struct drm_gpu_scheduler *sched)
>> +drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool
>> *have_more)
>>   {
>>   	struct drm_sched_job *job, *next;
>>   
>> @@ -1115,22 +1100,25 @@ drm_sched_get_finished_job(struct
>> drm_gpu_scheduler *sched)
>>   
>>   	job = list_first_entry_or_null(&sched->pending_list,
>>   				       struct drm_sched_job, list);
>> -
>>   	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>   		/* remove job from pending_list */
>>   		list_del_init(&job->list);
>>   
>>   		/* cancel this job's TO timer */
>>   		cancel_delayed_work(&sched->work_tdr);
>> -		/* make the scheduled timestamp more accurate */
>> +
>> +		*have_more = false;
>>   		next = list_first_entry_or_null(&sched-
>>> pending_list,
>>   						typeof(*next),
>> list);
>> -
>>   		if (next) {
>> +			/* make the scheduled timestamp more
>> accurate */
>>   			if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
>>   				     &next->s_fence-
>>> scheduled.flags))
>>   				next->s_fence->scheduled.timestamp =
>>   					dma_fence_timestamp(&job-
>>> s_fence->finished);
>> +
>> +			*have_more = dma_fence_is_signaled(&next-
>>> s_fence->finished);
>> +
>>   			/* start TO timer for next job */
>>   			drm_sched_start_timeout(sched);
>>   		}
>> @@ -1189,12 +1177,15 @@ static void drm_sched_free_job_work(struct
>> work_struct *w)
>>   	struct drm_gpu_scheduler *sched =
>>   		container_of(w, struct drm_gpu_scheduler,
>> work_free_job);
>>   	struct drm_sched_job *job;
>> +	bool have_more;
>>   
>> -	job = drm_sched_get_finished_job(sched);
>> -	if (job)
>> +	job = drm_sched_get_finished_job(sched, &have_more);
>> +	if (job) {
>>   		sched->ops->free_job(job);
>> +		if (have_more)
>> +			__drm_sched_run_free_queue(sched);
>> +	}
>>   
>> -	drm_sched_run_free_queue(sched);
>>   	drm_sched_run_job_queue(sched);
>>   }
>>   
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 05/15] drm/sched: Consolidate drm_sched_rq_select_entity_rr
  2025-07-08 11:31   ` Philipp Stanner
  2025-07-08 12:21     ` Danilo Krummrich
@ 2025-07-08 12:23     ` Tvrtko Ursulin
  1 sibling, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08 12:23 UTC (permalink / raw)
  To: phasta, dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Christian König,
	Danilo Krummrich, Matthew Brost


On 08/07/2025 12:31, Philipp Stanner wrote:
> On Tue, 2025-07-08 at 10:51 +0100, Tvrtko Ursulin wrote:
>> Extract out two copies of the identical code to function epilogue to
>> make
>> it smaller and more readable.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Danilo Krummrich <dakr@kernel.org>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Philipp Stanner <phasta@kernel.org>
> 
> Can also be branched out.

Done as well.

Regards,

Tvrtko


> P.
> 
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 48 +++++++++++-------------
>> --
>>   1 file changed, 20 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 2335df4de2b8..9212ebae7476 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -263,38 +263,14 @@ drm_sched_rq_select_entity_rr(struct
>> drm_gpu_scheduler *sched,
>>   	entity = rq->current_entity;
>>   	if (entity) {
>>   		list_for_each_entry_continue(entity, &rq->entities,
>> list) {
>> -			if (drm_sched_entity_is_ready(entity)) {
>> -				/* If we can't queue yet, preserve
>> the current
>> -				 * entity in terms of fairness.
>> -				 */
>> -				if (!drm_sched_can_queue(sched,
>> entity)) {
>> -					spin_unlock(&rq->lock);
>> -					return ERR_PTR(-ENOSPC);
>> -				}
>> -
>> -				rq->current_entity = entity;
>> -				reinit_completion(&entity-
>>> entity_idle);
>> -				spin_unlock(&rq->lock);
>> -				return entity;
>> -			}
>> +			if (drm_sched_entity_is_ready(entity))
>> +				goto found;
>>   		}
>>   	}
>>   
>>   	list_for_each_entry(entity, &rq->entities, list) {
>> -		if (drm_sched_entity_is_ready(entity)) {
>> -			/* If we can't queue yet, preserve the
>> current entity in
>> -			 * terms of fairness.
>> -			 */
>> -			if (!drm_sched_can_queue(sched, entity)) {
>> -				spin_unlock(&rq->lock);
>> -				return ERR_PTR(-ENOSPC);
>> -			}
>> -
>> -			rq->current_entity = entity;
>> -			reinit_completion(&entity->entity_idle);
>> -			spin_unlock(&rq->lock);
>> -			return entity;
>> -		}
>> +		if (drm_sched_entity_is_ready(entity))
>> +			goto found;
>>   
>>   		if (entity == rq->current_entity)
>>   			break;
>> @@ -303,6 +279,22 @@ drm_sched_rq_select_entity_rr(struct
>> drm_gpu_scheduler *sched,
>>   	spin_unlock(&rq->lock);
>>   
>>   	return NULL;
>> +
>> +found:
>> +	if (!drm_sched_can_queue(sched, entity)) {
>> +		/*
>> +		 * If scheduler cannot take more jobs signal the
>> caller to not
>> +		 * consider lower priority queues.
>> +		 */
>> +		entity = ERR_PTR(-ENOSPC);
>> +	} else {
>> +		rq->current_entity = entity;
>> +		reinit_completion(&entity->entity_idle);
>> +	}
>> +
>> +	spin_unlock(&rq->lock);
>> +
>> +	return entity;
>>   }
>>   
>>   /**
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation
  2025-07-08 12:19   ` Philipp Stanner
@ 2025-07-08 12:28     ` Tvrtko Ursulin
  0 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08 12:28 UTC (permalink / raw)
  To: phasta, dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Christian König,
	Danilo Krummrich, Matthew Brost


On 08/07/2025 13:19, Philipp Stanner wrote:
> On Tue, 2025-07-08 at 10:51 +0100, Tvrtko Ursulin wrote:
>> There is no reason to queue just a single job if scheduler can take
>> more
>> and re-queue the worker to queue more. We can simply feed the
>> hardware
>> with as much as it can take in one go and hopefully win some latency.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Danilo Krummrich <dakr@kernel.org>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Philipp Stanner <phasta@kernel.org>
>> ---
>>   drivers/gpu/drm/scheduler/sched_internal.h |   2 -
>>   drivers/gpu/drm/scheduler/sched_main.c     | 132 ++++++++++---------
>> --
>>   drivers/gpu/drm/scheduler/sched_rq.c       |  12 +-
>>   3 files changed, 64 insertions(+), 82 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_internal.h
>> b/drivers/gpu/drm/scheduler/sched_internal.h
>> index 15d78abc48df..1a5c2f255223 100644
>> --- a/drivers/gpu/drm/scheduler/sched_internal.h
>> +++ b/drivers/gpu/drm/scheduler/sched_internal.h
>> @@ -22,8 +22,6 @@ struct drm_sched_entity_stats {
>>   	u64		vruntime;
>>   };
>>   
>> -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
>> -			 struct drm_sched_entity *entity);
>>   void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>>   
>>   void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 35025edea669..1fb3f1da4821 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -95,35 +95,6 @@ static u32 drm_sched_available_credits(struct
>> drm_gpu_scheduler *sched)
>>   	return credits;
>>   }
>>   
>> -/**
>> - * drm_sched_can_queue -- Can we queue more to the hardware?
>> - * @sched: scheduler instance
>> - * @entity: the scheduler entity
>> - *
>> - * Return true if we can push at least one more job from @entity,
>> false
>> - * otherwise.
>> - */
>> -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
>> -			 struct drm_sched_entity *entity)
>> -{
>> -	struct drm_sched_job *s_job;
>> -
>> -	s_job = drm_sched_entity_queue_peek(entity);
>> -	if (!s_job)
>> -		return false;
>> -
>> -	/* If a job exceeds the credit limit, truncate it to the
>> credit limit
>> -	 * itself to guarantee forward progress.
>> -	 */
>> -	if (s_job->credits > sched->credit_limit) {
>> -		dev_WARN(sched->dev,
>> -			 "Jobs may not exceed the credit limit,
>> truncate.\n");
>> -		s_job->credits = sched->credit_limit;
>> -	}
>> -
>> -	return drm_sched_available_credits(sched) >= s_job->credits;
>> -}
>> -
>>   /**
>>    * drm_sched_run_job_queue - enqueue run-job work
>>    * @sched: scheduler instance
>> @@ -940,54 +911,77 @@ static void drm_sched_run_job_work(struct
>> work_struct *w)
>>   {
>>   	struct drm_gpu_scheduler *sched =
>>   		container_of(w, struct drm_gpu_scheduler,
>> work_run_job);
>> +	u32 job_credits, submitted_credits = 0;
>>   	struct drm_sched_entity *entity;
>> -	struct dma_fence *fence;
>>   	struct drm_sched_fence *s_fence;
>>   	struct drm_sched_job *sched_job;
>> -	int r;
>> +	struct dma_fence *fence;
>>   
>> -	/* Find entity with a ready job */
>> -	entity = drm_sched_rq_select_entity(sched, sched->rq);
>> -	if (IS_ERR_OR_NULL(entity))
>> -		return;	/* No more work */
>> +	while (!READ_ONCE(sched->pause_submit)) {
> 
> I had it on my agenda whether we can remove these atomic booleans once
> and for all. If we can't, we probably want to document why they are
> there.

It would indeed be nice to remove them. But hopefully out of scope of 
this series.

>> +		/* Find entity with a ready job */
>> +		entity = drm_sched_rq_select_entity(sched, sched-
>>> rq);
>> +		if (!entity)
>> +			break;	/* No more work */
>> +
>> +		sched_job = drm_sched_entity_queue_peek(entity);
>> +		if (!sched_job) {
>> +			complete_all(&entity->entity_idle);
>> +			continue;
>> +		}
>> +
>> +		job_credits = sched_job->credits;
>> +		/*
>> +		 * If a job exceeds the credit limit truncate it to
>> guarantee
>> +		 * forward progress.
>> +		 */
>> +		if (dev_WARN_ONCE(sched->dev, job_credits > sched-
>>> credit_limit,
>> +				  "Jobs may not exceed the credit
>> limit, truncating.\n"))
>> +			job_credits = sched_job->credits = sched-
>>> credit_limit;
>> +
>> +		if (job_credits >
>> drm_sched_available_credits(sched)) {
>> +			complete_all(&entity->entity_idle);
>> +			break;
>> +		}
>> +
>> +		sched_job = drm_sched_entity_pop_job(entity);
>> +		if (!sched_job) {
>> +			/* Top entity is not yet runnable after all
>> */
>> +			complete_all(&entity->entity_idle);
>> +			continue;
>> +		}
>> +
>> +		s_fence = sched_job->s_fence;
>> +		drm_sched_job_begin(sched_job);
>> +		trace_drm_sched_job_run(sched_job, entity);
>> +		submitted_credits += job_credits;
>> +		atomic_add(job_credits, &sched->credit_count);
>> +
>> +		fence = sched->ops->run_job(sched_job);
>> +		drm_sched_fence_scheduled(s_fence, fence);
>> +
>> +		if (!IS_ERR_OR_NULL(fence)) {
>> +			int r;
>> +
>> +			/* Drop for original kref_init of the fence
>> */
>> +			dma_fence_put(fence);
> 
> Not sure if you're aware, but this is based on quite some outdated
> code. In mainline this function has been reworked a while ago.

Well spotted, I missed 72ebc18b3499 ("drm/sched: Document run_job() 
refcount hazard"). Will fix.

Regards,

Tvrtko

>> +
>> +			r = dma_fence_add_callback(fence,
>> &sched_job->cb,
>> +						
>> drm_sched_job_done_cb);
>> +			if (r == -ENOENT)
>> +				drm_sched_job_done(sched_job, fence-
>>> error);
>> +			else if (r)
>> +				DRM_DEV_ERROR(sched->dev,
>> +					      "fence add callback
>> failed (%d)\n", r);
>> +		} else {
>> +			drm_sched_job_done(sched_job, IS_ERR(fence)
>> ?
>> +						      PTR_ERR(fence)
>> : 0);
>> +		}
>>   
>> -	sched_job = drm_sched_entity_pop_job(entity);
>> -	if (!sched_job) {
>>   		complete_all(&entity->entity_idle);
>> -		drm_sched_run_job_queue(sched);
>> -		return;
>>   	}
>>   
>> -	s_fence = sched_job->s_fence;
>> -
>> -	atomic_add(sched_job->credits, &sched->credit_count);
>> -	drm_sched_job_begin(sched_job);
>> -
>> -	trace_drm_sched_job_run(sched_job, entity);
>> -	/*
>> -	 * The run_job() callback must by definition return a fence
>> whose
>> -	 * refcount has been incremented for the scheduler already.
>> -	 */
>> -	fence = sched->ops->run_job(sched_job);
>> -	complete_all(&entity->entity_idle);
>> -	drm_sched_fence_scheduled(s_fence, fence);
>> -
>> -	if (!IS_ERR_OR_NULL(fence)) {
>> -		r = dma_fence_add_callback(fence, &sched_job->cb,
>> -					   drm_sched_job_done_cb);
>> -		if (r == -ENOENT)
>> -			drm_sched_job_done(sched_job, fence->error);
>> -		else if (r)
>> -			DRM_DEV_ERROR(sched->dev, "fence add
>> callback failed (%d)\n", r);
>> -
>> -		dma_fence_put(fence);
>> -	} else {
>> -		drm_sched_job_done(sched_job, IS_ERR(fence) ?
>> -				   PTR_ERR(fence) : 0);
>> -	}
>> -
>> -	wake_up(&sched->job_scheduled);
>> -	drm_sched_run_job_queue(sched);
>> +	if (submitted_credits)
>> +		wake_up(&sched->job_scheduled);
>>   }
>>   
>>   static struct workqueue_struct *drm_sched_alloc_wq(const char *name)
>> diff --git a/drivers/gpu/drm/scheduler/sched_rq.c
>> b/drivers/gpu/drm/scheduler/sched_rq.c
>> index e22f9ff88822..f0afdc0bd417 100644
>> --- a/drivers/gpu/drm/scheduler/sched_rq.c
>> +++ b/drivers/gpu/drm/scheduler/sched_rq.c
>> @@ -197,9 +197,7 @@ void drm_sched_rq_pop_entity(struct
>> drm_sched_entity *entity)
>>    *
>>    * Find oldest waiting ready entity.
>>    *
>> - * Return an entity if one is found; return an error-pointer (!NULL)
>> if an
>> - * entity was ready, but the scheduler had insufficient credits to
>> accommodate
>> - * its job; return NULL, if no ready entity was found.
>> + * Return an entity if one is found or NULL if no ready entity was
>> found.
>>    */
>>   struct drm_sched_entity *
>>   drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
>> @@ -213,14 +211,6 @@ drm_sched_rq_select_entity(struct
>> drm_gpu_scheduler *sched,
>>   
>>   		entity = rb_entry(rb, struct drm_sched_entity,
>> rb_tree_node);
>>   		if (drm_sched_entity_is_ready(entity)) {
>> -			/* If we can't queue yet, preserve the
>> current entity in
>> -			 * terms of fairness.
>> -			 */
>> -			if (!drm_sched_can_queue(sched, entity)) {
>> -				spin_unlock(&rq->lock);
>> -				return ERR_PTR(-ENOSPC);
>> -			}
>> -
>>   			reinit_completion(&entity->entity_idle);
>>   			break;
>>   		}
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation
  2025-07-08  9:51 ` [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation Tvrtko Ursulin
  2025-07-08 12:19   ` Philipp Stanner
@ 2025-07-08 12:37   ` Christian König
  2025-07-08 12:54     ` Tvrtko Ursulin
  1 sibling, 1 reply; 31+ messages in thread
From: Christian König @ 2025-07-08 12:37 UTC (permalink / raw)
  To: Tvrtko Ursulin, dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Danilo Krummrich, Matthew Brost,
	Philipp Stanner

On 08.07.25 11:51, Tvrtko Ursulin wrote:
> There is no reason to queue just a single job if scheduler can take more
> and re-queue the worker to queue more.

That's not correct. This was intentionally avoided.

If more than just the scheduler is using the single threaded workqeueu other workers, especially the timeout worker, can jump in and execute first.

We explicitely removed submitting more than one job in each worker run.

Regards,
Christian.

> We can simply feed the hardware
> with as much as it can take in one go and hopefully win some latency.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> ---
>  drivers/gpu/drm/scheduler/sched_internal.h |   2 -
>  drivers/gpu/drm/scheduler/sched_main.c     | 132 ++++++++++-----------
>  drivers/gpu/drm/scheduler/sched_rq.c       |  12 +-
>  3 files changed, 64 insertions(+), 82 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
> index 15d78abc48df..1a5c2f255223 100644
> --- a/drivers/gpu/drm/scheduler/sched_internal.h
> +++ b/drivers/gpu/drm/scheduler/sched_internal.h
> @@ -22,8 +22,6 @@ struct drm_sched_entity_stats {
>  	u64		vruntime;
>  };
>  
> -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
> -			 struct drm_sched_entity *entity);
>  void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>  
>  void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 35025edea669..1fb3f1da4821 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -95,35 +95,6 @@ static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
>  	return credits;
>  }
>  
> -/**
> - * drm_sched_can_queue -- Can we queue more to the hardware?
> - * @sched: scheduler instance
> - * @entity: the scheduler entity
> - *
> - * Return true if we can push at least one more job from @entity, false
> - * otherwise.
> - */
> -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
> -			 struct drm_sched_entity *entity)
> -{
> -	struct drm_sched_job *s_job;
> -
> -	s_job = drm_sched_entity_queue_peek(entity);
> -	if (!s_job)
> -		return false;
> -
> -	/* If a job exceeds the credit limit, truncate it to the credit limit
> -	 * itself to guarantee forward progress.
> -	 */
> -	if (s_job->credits > sched->credit_limit) {
> -		dev_WARN(sched->dev,
> -			 "Jobs may not exceed the credit limit, truncate.\n");
> -		s_job->credits = sched->credit_limit;
> -	}
> -
> -	return drm_sched_available_credits(sched) >= s_job->credits;
> -}
> -
>  /**
>   * drm_sched_run_job_queue - enqueue run-job work
>   * @sched: scheduler instance
> @@ -940,54 +911,77 @@ static void drm_sched_run_job_work(struct work_struct *w)
>  {
>  	struct drm_gpu_scheduler *sched =
>  		container_of(w, struct drm_gpu_scheduler, work_run_job);
> +	u32 job_credits, submitted_credits = 0;
>  	struct drm_sched_entity *entity;
> -	struct dma_fence *fence;
>  	struct drm_sched_fence *s_fence;
>  	struct drm_sched_job *sched_job;
> -	int r;
> +	struct dma_fence *fence;
>  
> -	/* Find entity with a ready job */
> -	entity = drm_sched_rq_select_entity(sched, sched->rq);
> -	if (IS_ERR_OR_NULL(entity))
> -		return;	/* No more work */
> +	while (!READ_ONCE(sched->pause_submit)) {
> +		/* Find entity with a ready job */
> +		entity = drm_sched_rq_select_entity(sched, sched->rq);
> +		if (!entity)
> +			break;	/* No more work */
> +
> +		sched_job = drm_sched_entity_queue_peek(entity);
> +		if (!sched_job) {
> +			complete_all(&entity->entity_idle);
> +			continue;
> +		}
> +
> +		job_credits = sched_job->credits;
> +		/*
> +		 * If a job exceeds the credit limit truncate it to guarantee
> +		 * forward progress.
> +		 */
> +		if (dev_WARN_ONCE(sched->dev, job_credits > sched->credit_limit,
> +				  "Jobs may not exceed the credit limit, truncating.\n"))
> +			job_credits = sched_job->credits = sched->credit_limit;
> +
> +		if (job_credits > drm_sched_available_credits(sched)) {
> +			complete_all(&entity->entity_idle);
> +			break;
> +		}
> +
> +		sched_job = drm_sched_entity_pop_job(entity);
> +		if (!sched_job) {
> +			/* Top entity is not yet runnable after all */
> +			complete_all(&entity->entity_idle);
> +			continue;
> +		}
> +
> +		s_fence = sched_job->s_fence;
> +		drm_sched_job_begin(sched_job);
> +		trace_drm_sched_job_run(sched_job, entity);
> +		submitted_credits += job_credits;
> +		atomic_add(job_credits, &sched->credit_count);
> +
> +		fence = sched->ops->run_job(sched_job);
> +		drm_sched_fence_scheduled(s_fence, fence);
> +
> +		if (!IS_ERR_OR_NULL(fence)) {
> +			int r;
> +
> +			/* Drop for original kref_init of the fence */
> +			dma_fence_put(fence);
> +
> +			r = dma_fence_add_callback(fence, &sched_job->cb,
> +						   drm_sched_job_done_cb);
> +			if (r == -ENOENT)
> +				drm_sched_job_done(sched_job, fence->error);
> +			else if (r)
> +				DRM_DEV_ERROR(sched->dev,
> +					      "fence add callback failed (%d)\n", r);
> +		} else {
> +			drm_sched_job_done(sched_job, IS_ERR(fence) ?
> +						      PTR_ERR(fence) : 0);
> +		}
>  
> -	sched_job = drm_sched_entity_pop_job(entity);
> -	if (!sched_job) {
>  		complete_all(&entity->entity_idle);
> -		drm_sched_run_job_queue(sched);
> -		return;
>  	}
>  
> -	s_fence = sched_job->s_fence;
> -
> -	atomic_add(sched_job->credits, &sched->credit_count);
> -	drm_sched_job_begin(sched_job);
> -
> -	trace_drm_sched_job_run(sched_job, entity);
> -	/*
> -	 * The run_job() callback must by definition return a fence whose
> -	 * refcount has been incremented for the scheduler already.
> -	 */
> -	fence = sched->ops->run_job(sched_job);
> -	complete_all(&entity->entity_idle);
> -	drm_sched_fence_scheduled(s_fence, fence);
> -
> -	if (!IS_ERR_OR_NULL(fence)) {
> -		r = dma_fence_add_callback(fence, &sched_job->cb,
> -					   drm_sched_job_done_cb);
> -		if (r == -ENOENT)
> -			drm_sched_job_done(sched_job, fence->error);
> -		else if (r)
> -			DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r);
> -
> -		dma_fence_put(fence);
> -	} else {
> -		drm_sched_job_done(sched_job, IS_ERR(fence) ?
> -				   PTR_ERR(fence) : 0);
> -	}
> -
> -	wake_up(&sched->job_scheduled);
> -	drm_sched_run_job_queue(sched);
> +	if (submitted_credits)
> +		wake_up(&sched->job_scheduled);
>  }
>  
>  static struct workqueue_struct *drm_sched_alloc_wq(const char *name)
> diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
> index e22f9ff88822..f0afdc0bd417 100644
> --- a/drivers/gpu/drm/scheduler/sched_rq.c
> +++ b/drivers/gpu/drm/scheduler/sched_rq.c
> @@ -197,9 +197,7 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
>   *
>   * Find oldest waiting ready entity.
>   *
> - * Return an entity if one is found; return an error-pointer (!NULL) if an
> - * entity was ready, but the scheduler had insufficient credits to accommodate
> - * its job; return NULL, if no ready entity was found.
> + * Return an entity if one is found or NULL if no ready entity was found.
>   */
>  struct drm_sched_entity *
>  drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
> @@ -213,14 +211,6 @@ drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
>  
>  		entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
>  		if (drm_sched_entity_is_ready(entity)) {
> -			/* If we can't queue yet, preserve the current entity in
> -			 * terms of fairness.
> -			 */
> -			if (!drm_sched_can_queue(sched, entity)) {
> -				spin_unlock(&rq->lock);
> -				return ERR_PTR(-ENOSPC);
> -			}
> -
>  			reinit_completion(&entity->entity_idle);
>  			break;
>  		}


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation
  2025-07-08 12:37   ` Christian König
@ 2025-07-08 12:54     ` Tvrtko Ursulin
  2025-07-08 13:02       ` Christian König
  0 siblings, 1 reply; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08 12:54 UTC (permalink / raw)
  To: Christian König, dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Danilo Krummrich, Matthew Brost,
	Philipp Stanner


On 08/07/2025 13:37, Christian König wrote:
> On 08.07.25 11:51, Tvrtko Ursulin wrote:
>> There is no reason to queue just a single job if scheduler can take more
>> and re-queue the worker to queue more.
> 
> That's not correct. This was intentionally avoided.
> 
> If more than just the scheduler is using the single threaded workqeueu other workers, especially the timeout worker, can jump in and execute first.
> 
> We explicitely removed submitting more than one job in each worker run.

I wanted to ask why, but then I had a look to see if anyone actually 
does this. And I did not find any driver sharing a single threaded 
workqueue between submit and timeout.

The only driver which even passes in the same workqueue for both is PVR, 
but it is not a single threaded one.

Or perhaps I misunderstood what you said. Could you please clarify 
either way?

Regards,

Tvrtko

>> We can simply feed the hardware
>> with as much as it can take in one go and hopefully win some latency.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Danilo Krummrich <dakr@kernel.org>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Philipp Stanner <phasta@kernel.org>
>> ---
>>   drivers/gpu/drm/scheduler/sched_internal.h |   2 -
>>   drivers/gpu/drm/scheduler/sched_main.c     | 132 ++++++++++-----------
>>   drivers/gpu/drm/scheduler/sched_rq.c       |  12 +-
>>   3 files changed, 64 insertions(+), 82 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
>> index 15d78abc48df..1a5c2f255223 100644
>> --- a/drivers/gpu/drm/scheduler/sched_internal.h
>> +++ b/drivers/gpu/drm/scheduler/sched_internal.h
>> @@ -22,8 +22,6 @@ struct drm_sched_entity_stats {
>>   	u64		vruntime;
>>   };
>>   
>> -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
>> -			 struct drm_sched_entity *entity);
>>   void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>>   
>>   void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 35025edea669..1fb3f1da4821 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -95,35 +95,6 @@ static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
>>   	return credits;
>>   }
>>   
>> -/**
>> - * drm_sched_can_queue -- Can we queue more to the hardware?
>> - * @sched: scheduler instance
>> - * @entity: the scheduler entity
>> - *
>> - * Return true if we can push at least one more job from @entity, false
>> - * otherwise.
>> - */
>> -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
>> -			 struct drm_sched_entity *entity)
>> -{
>> -	struct drm_sched_job *s_job;
>> -
>> -	s_job = drm_sched_entity_queue_peek(entity);
>> -	if (!s_job)
>> -		return false;
>> -
>> -	/* If a job exceeds the credit limit, truncate it to the credit limit
>> -	 * itself to guarantee forward progress.
>> -	 */
>> -	if (s_job->credits > sched->credit_limit) {
>> -		dev_WARN(sched->dev,
>> -			 "Jobs may not exceed the credit limit, truncate.\n");
>> -		s_job->credits = sched->credit_limit;
>> -	}
>> -
>> -	return drm_sched_available_credits(sched) >= s_job->credits;
>> -}
>> -
>>   /**
>>    * drm_sched_run_job_queue - enqueue run-job work
>>    * @sched: scheduler instance
>> @@ -940,54 +911,77 @@ static void drm_sched_run_job_work(struct work_struct *w)
>>   {
>>   	struct drm_gpu_scheduler *sched =
>>   		container_of(w, struct drm_gpu_scheduler, work_run_job);
>> +	u32 job_credits, submitted_credits = 0;
>>   	struct drm_sched_entity *entity;
>> -	struct dma_fence *fence;
>>   	struct drm_sched_fence *s_fence;
>>   	struct drm_sched_job *sched_job;
>> -	int r;
>> +	struct dma_fence *fence;
>>   
>> -	/* Find entity with a ready job */
>> -	entity = drm_sched_rq_select_entity(sched, sched->rq);
>> -	if (IS_ERR_OR_NULL(entity))
>> -		return;	/* No more work */
>> +	while (!READ_ONCE(sched->pause_submit)) {
>> +		/* Find entity with a ready job */
>> +		entity = drm_sched_rq_select_entity(sched, sched->rq);
>> +		if (!entity)
>> +			break;	/* No more work */
>> +
>> +		sched_job = drm_sched_entity_queue_peek(entity);
>> +		if (!sched_job) {
>> +			complete_all(&entity->entity_idle);
>> +			continue;
>> +		}
>> +
>> +		job_credits = sched_job->credits;
>> +		/*
>> +		 * If a job exceeds the credit limit truncate it to guarantee
>> +		 * forward progress.
>> +		 */
>> +		if (dev_WARN_ONCE(sched->dev, job_credits > sched->credit_limit,
>> +				  "Jobs may not exceed the credit limit, truncating.\n"))
>> +			job_credits = sched_job->credits = sched->credit_limit;
>> +
>> +		if (job_credits > drm_sched_available_credits(sched)) {
>> +			complete_all(&entity->entity_idle);
>> +			break;
>> +		}
>> +
>> +		sched_job = drm_sched_entity_pop_job(entity);
>> +		if (!sched_job) {
>> +			/* Top entity is not yet runnable after all */
>> +			complete_all(&entity->entity_idle);
>> +			continue;
>> +		}
>> +
>> +		s_fence = sched_job->s_fence;
>> +		drm_sched_job_begin(sched_job);
>> +		trace_drm_sched_job_run(sched_job, entity);
>> +		submitted_credits += job_credits;
>> +		atomic_add(job_credits, &sched->credit_count);
>> +
>> +		fence = sched->ops->run_job(sched_job);
>> +		drm_sched_fence_scheduled(s_fence, fence);
>> +
>> +		if (!IS_ERR_OR_NULL(fence)) {
>> +			int r;
>> +
>> +			/* Drop for original kref_init of the fence */
>> +			dma_fence_put(fence);
>> +
>> +			r = dma_fence_add_callback(fence, &sched_job->cb,
>> +						   drm_sched_job_done_cb);
>> +			if (r == -ENOENT)
>> +				drm_sched_job_done(sched_job, fence->error);
>> +			else if (r)
>> +				DRM_DEV_ERROR(sched->dev,
>> +					      "fence add callback failed (%d)\n", r);
>> +		} else {
>> +			drm_sched_job_done(sched_job, IS_ERR(fence) ?
>> +						      PTR_ERR(fence) : 0);
>> +		}
>>   
>> -	sched_job = drm_sched_entity_pop_job(entity);
>> -	if (!sched_job) {
>>   		complete_all(&entity->entity_idle);
>> -		drm_sched_run_job_queue(sched);
>> -		return;
>>   	}
>>   
>> -	s_fence = sched_job->s_fence;
>> -
>> -	atomic_add(sched_job->credits, &sched->credit_count);
>> -	drm_sched_job_begin(sched_job);
>> -
>> -	trace_drm_sched_job_run(sched_job, entity);
>> -	/*
>> -	 * The run_job() callback must by definition return a fence whose
>> -	 * refcount has been incremented for the scheduler already.
>> -	 */
>> -	fence = sched->ops->run_job(sched_job);
>> -	complete_all(&entity->entity_idle);
>> -	drm_sched_fence_scheduled(s_fence, fence);
>> -
>> -	if (!IS_ERR_OR_NULL(fence)) {
>> -		r = dma_fence_add_callback(fence, &sched_job->cb,
>> -					   drm_sched_job_done_cb);
>> -		if (r == -ENOENT)
>> -			drm_sched_job_done(sched_job, fence->error);
>> -		else if (r)
>> -			DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r);
>> -
>> -		dma_fence_put(fence);
>> -	} else {
>> -		drm_sched_job_done(sched_job, IS_ERR(fence) ?
>> -				   PTR_ERR(fence) : 0);
>> -	}
>> -
>> -	wake_up(&sched->job_scheduled);
>> -	drm_sched_run_job_queue(sched);
>> +	if (submitted_credits)
>> +		wake_up(&sched->job_scheduled);
>>   }
>>   
>>   static struct workqueue_struct *drm_sched_alloc_wq(const char *name)
>> diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
>> index e22f9ff88822..f0afdc0bd417 100644
>> --- a/drivers/gpu/drm/scheduler/sched_rq.c
>> +++ b/drivers/gpu/drm/scheduler/sched_rq.c
>> @@ -197,9 +197,7 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
>>    *
>>    * Find oldest waiting ready entity.
>>    *
>> - * Return an entity if one is found; return an error-pointer (!NULL) if an
>> - * entity was ready, but the scheduler had insufficient credits to accommodate
>> - * its job; return NULL, if no ready entity was found.
>> + * Return an entity if one is found or NULL if no ready entity was found.
>>    */
>>   struct drm_sched_entity *
>>   drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
>> @@ -213,14 +211,6 @@ drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
>>   
>>   		entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
>>   		if (drm_sched_entity_is_ready(entity)) {
>> -			/* If we can't queue yet, preserve the current entity in
>> -			 * terms of fairness.
>> -			 */
>> -			if (!drm_sched_can_queue(sched, entity)) {
>> -				spin_unlock(&rq->lock);
>> -				return ERR_PTR(-ENOSPC);
>> -			}
>> -
>>   			reinit_completion(&entity->entity_idle);
>>   			break;
>>   		}
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation
  2025-07-08 12:54     ` Tvrtko Ursulin
@ 2025-07-08 13:02       ` Christian König
  2025-07-08 15:31         ` Tvrtko Ursulin
  0 siblings, 1 reply; 31+ messages in thread
From: Christian König @ 2025-07-08 13:02 UTC (permalink / raw)
  To: Tvrtko Ursulin, dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Danilo Krummrich, Matthew Brost,
	Philipp Stanner

On 08.07.25 14:54, Tvrtko Ursulin wrote:
> 
> On 08/07/2025 13:37, Christian König wrote:
>> On 08.07.25 11:51, Tvrtko Ursulin wrote:
>>> There is no reason to queue just a single job if scheduler can take more
>>> and re-queue the worker to queue more.
>>
>> That's not correct. This was intentionally avoided.
>>
>> If more than just the scheduler is using the single threaded workqeueu other workers, especially the timeout worker, can jump in and execute first.
>>
>> We explicitely removed submitting more than one job in each worker run.
> 
> I wanted to ask why, but then I had a look to see if anyone actually does this. And I did not find any driver sharing a single threaded workqueue between submit and timeout.
> 
> The only driver which even passes in the same workqueue for both is PVR, but it is not a single threaded one.
> 
> Or perhaps I misunderstood what you said. Could you please clarify either way?

You correctly understood that.

The argument was that submitting more than one job in a worker is simply not beneficial and other work items can jump in and execute.

I have no idea if that is actually used or not. You would need to dig up the discussion when we switched from a kernel thread to work items for the full background.

But in general to do as less work as possible in each worker and then re-submit it is usually a good idea.

Regards,
Christian.

> 
> Regards,
> 
> Tvrtko
> 
>>> We can simply feed the hardware
>>> with as much as it can take in one go and hopefully win some latency.
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Danilo Krummrich <dakr@kernel.org>
>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>> Cc: Philipp Stanner <phasta@kernel.org>
>>> ---
>>>   drivers/gpu/drm/scheduler/sched_internal.h |   2 -
>>>   drivers/gpu/drm/scheduler/sched_main.c     | 132 ++++++++++-----------
>>>   drivers/gpu/drm/scheduler/sched_rq.c       |  12 +-
>>>   3 files changed, 64 insertions(+), 82 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
>>> index 15d78abc48df..1a5c2f255223 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_internal.h
>>> +++ b/drivers/gpu/drm/scheduler/sched_internal.h
>>> @@ -22,8 +22,6 @@ struct drm_sched_entity_stats {
>>>       u64        vruntime;
>>>   };
>>>   -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
>>> -             struct drm_sched_entity *entity);
>>>   void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>>>     void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 35025edea669..1fb3f1da4821 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -95,35 +95,6 @@ static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
>>>       return credits;
>>>   }
>>>   -/**
>>> - * drm_sched_can_queue -- Can we queue more to the hardware?
>>> - * @sched: scheduler instance
>>> - * @entity: the scheduler entity
>>> - *
>>> - * Return true if we can push at least one more job from @entity, false
>>> - * otherwise.
>>> - */
>>> -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
>>> -             struct drm_sched_entity *entity)
>>> -{
>>> -    struct drm_sched_job *s_job;
>>> -
>>> -    s_job = drm_sched_entity_queue_peek(entity);
>>> -    if (!s_job)
>>> -        return false;
>>> -
>>> -    /* If a job exceeds the credit limit, truncate it to the credit limit
>>> -     * itself to guarantee forward progress.
>>> -     */
>>> -    if (s_job->credits > sched->credit_limit) {
>>> -        dev_WARN(sched->dev,
>>> -             "Jobs may not exceed the credit limit, truncate.\n");
>>> -        s_job->credits = sched->credit_limit;
>>> -    }
>>> -
>>> -    return drm_sched_available_credits(sched) >= s_job->credits;
>>> -}
>>> -
>>>   /**
>>>    * drm_sched_run_job_queue - enqueue run-job work
>>>    * @sched: scheduler instance
>>> @@ -940,54 +911,77 @@ static void drm_sched_run_job_work(struct work_struct *w)
>>>   {
>>>       struct drm_gpu_scheduler *sched =
>>>           container_of(w, struct drm_gpu_scheduler, work_run_job);
>>> +    u32 job_credits, submitted_credits = 0;
>>>       struct drm_sched_entity *entity;
>>> -    struct dma_fence *fence;
>>>       struct drm_sched_fence *s_fence;
>>>       struct drm_sched_job *sched_job;
>>> -    int r;
>>> +    struct dma_fence *fence;
>>>   -    /* Find entity with a ready job */
>>> -    entity = drm_sched_rq_select_entity(sched, sched->rq);
>>> -    if (IS_ERR_OR_NULL(entity))
>>> -        return;    /* No more work */
>>> +    while (!READ_ONCE(sched->pause_submit)) {
>>> +        /* Find entity with a ready job */
>>> +        entity = drm_sched_rq_select_entity(sched, sched->rq);
>>> +        if (!entity)
>>> +            break;    /* No more work */
>>> +
>>> +        sched_job = drm_sched_entity_queue_peek(entity);
>>> +        if (!sched_job) {
>>> +            complete_all(&entity->entity_idle);
>>> +            continue;
>>> +        }
>>> +
>>> +        job_credits = sched_job->credits;
>>> +        /*
>>> +         * If a job exceeds the credit limit truncate it to guarantee
>>> +         * forward progress.
>>> +         */
>>> +        if (dev_WARN_ONCE(sched->dev, job_credits > sched->credit_limit,
>>> +                  "Jobs may not exceed the credit limit, truncating.\n"))
>>> +            job_credits = sched_job->credits = sched->credit_limit;
>>> +
>>> +        if (job_credits > drm_sched_available_credits(sched)) {
>>> +            complete_all(&entity->entity_idle);
>>> +            break;
>>> +        }
>>> +
>>> +        sched_job = drm_sched_entity_pop_job(entity);
>>> +        if (!sched_job) {
>>> +            /* Top entity is not yet runnable after all */
>>> +            complete_all(&entity->entity_idle);
>>> +            continue;
>>> +        }
>>> +
>>> +        s_fence = sched_job->s_fence;
>>> +        drm_sched_job_begin(sched_job);
>>> +        trace_drm_sched_job_run(sched_job, entity);
>>> +        submitted_credits += job_credits;
>>> +        atomic_add(job_credits, &sched->credit_count);
>>> +
>>> +        fence = sched->ops->run_job(sched_job);
>>> +        drm_sched_fence_scheduled(s_fence, fence);
>>> +
>>> +        if (!IS_ERR_OR_NULL(fence)) {
>>> +            int r;
>>> +
>>> +            /* Drop for original kref_init of the fence */
>>> +            dma_fence_put(fence);
>>> +
>>> +            r = dma_fence_add_callback(fence, &sched_job->cb,
>>> +                           drm_sched_job_done_cb);
>>> +            if (r == -ENOENT)
>>> +                drm_sched_job_done(sched_job, fence->error);
>>> +            else if (r)
>>> +                DRM_DEV_ERROR(sched->dev,
>>> +                          "fence add callback failed (%d)\n", r);
>>> +        } else {
>>> +            drm_sched_job_done(sched_job, IS_ERR(fence) ?
>>> +                              PTR_ERR(fence) : 0);
>>> +        }
>>>   -    sched_job = drm_sched_entity_pop_job(entity);
>>> -    if (!sched_job) {
>>>           complete_all(&entity->entity_idle);
>>> -        drm_sched_run_job_queue(sched);
>>> -        return;
>>>       }
>>>   -    s_fence = sched_job->s_fence;
>>> -
>>> -    atomic_add(sched_job->credits, &sched->credit_count);
>>> -    drm_sched_job_begin(sched_job);
>>> -
>>> -    trace_drm_sched_job_run(sched_job, entity);
>>> -    /*
>>> -     * The run_job() callback must by definition return a fence whose
>>> -     * refcount has been incremented for the scheduler already.
>>> -     */
>>> -    fence = sched->ops->run_job(sched_job);
>>> -    complete_all(&entity->entity_idle);
>>> -    drm_sched_fence_scheduled(s_fence, fence);
>>> -
>>> -    if (!IS_ERR_OR_NULL(fence)) {
>>> -        r = dma_fence_add_callback(fence, &sched_job->cb,
>>> -                       drm_sched_job_done_cb);
>>> -        if (r == -ENOENT)
>>> -            drm_sched_job_done(sched_job, fence->error);
>>> -        else if (r)
>>> -            DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r);
>>> -
>>> -        dma_fence_put(fence);
>>> -    } else {
>>> -        drm_sched_job_done(sched_job, IS_ERR(fence) ?
>>> -                   PTR_ERR(fence) : 0);
>>> -    }
>>> -
>>> -    wake_up(&sched->job_scheduled);
>>> -    drm_sched_run_job_queue(sched);
>>> +    if (submitted_credits)
>>> +        wake_up(&sched->job_scheduled);
>>>   }
>>>     static struct workqueue_struct *drm_sched_alloc_wq(const char *name)
>>> diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
>>> index e22f9ff88822..f0afdc0bd417 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_rq.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_rq.c
>>> @@ -197,9 +197,7 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
>>>    *
>>>    * Find oldest waiting ready entity.
>>>    *
>>> - * Return an entity if one is found; return an error-pointer (!NULL) if an
>>> - * entity was ready, but the scheduler had insufficient credits to accommodate
>>> - * its job; return NULL, if no ready entity was found.
>>> + * Return an entity if one is found or NULL if no ready entity was found.
>>>    */
>>>   struct drm_sched_entity *
>>>   drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
>>> @@ -213,14 +211,6 @@ drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
>>>             entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
>>>           if (drm_sched_entity_is_ready(entity)) {
>>> -            /* If we can't queue yet, preserve the current entity in
>>> -             * terms of fairness.
>>> -             */
>>> -            if (!drm_sched_can_queue(sched, entity)) {
>>> -                spin_unlock(&rq->lock);
>>> -                return ERR_PTR(-ENOSPC);
>>> -            }
>>> -
>>>               reinit_completion(&entity->entity_idle);
>>>               break;
>>>           }
>>
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation
  2025-07-08 13:02       ` Christian König
@ 2025-07-08 15:31         ` Tvrtko Ursulin
  2025-07-08 18:59           ` Matthew Brost
  2025-07-09  8:57           ` Christian König
  0 siblings, 2 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2025-07-08 15:31 UTC (permalink / raw)
  To: Christian König, dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Danilo Krummrich, Matthew Brost,
	Philipp Stanner


On 08/07/2025 14:02, Christian König wrote:
> On 08.07.25 14:54, Tvrtko Ursulin wrote:
>>
>> On 08/07/2025 13:37, Christian König wrote:
>>> On 08.07.25 11:51, Tvrtko Ursulin wrote:
>>>> There is no reason to queue just a single job if scheduler can take more
>>>> and re-queue the worker to queue more.
>>>
>>> That's not correct. This was intentionally avoided.
>>>
>>> If more than just the scheduler is using the single threaded workqeueu other workers, especially the timeout worker, can jump in and execute first.
>>>
>>> We explicitely removed submitting more than one job in each worker run.
>>
>> I wanted to ask why, but then I had a look to see if anyone actually does this. And I did not find any driver sharing a single threaded workqueue between submit and timeout.
>>
>> The only driver which even passes in the same workqueue for both is PVR, but it is not a single threaded one.
>>
>> Or perhaps I misunderstood what you said. Could you please clarify either way?
> 
> You correctly understood that.
> 
> The argument was that submitting more than one job in a worker is simply not beneficial and other work items can jump in and execute.
> 
> I have no idea if that is actually used or not. You would need to dig up the discussion when we switched from a kernel thread to work items for the full background.
> 
> But in general to do as less work as possible in each worker and then re-submit it is usually a good idea.

 From the point of view that the single work item invocation shouldn't 
hog the worker, if the worker is shared, I agree. But what we also want 
is to feed the GPU as fast as possible, ie. put the CPU to sleep as 
quickly as possible.

If we consider drivers with dedicated workqueues per hardware engine, or 
even per userspace context, then especially in those cases I don't see 
what is the benefit of playing the wq re-queue games.

Anyway, I can park this patch for now, I *think* it will be easy to drop 
and will just need to rebase 15/15 to cope.

In the meantime I have collected some stats when running Cyberpunk 2077 
benchmark on amdgpu, just to remind myself that it does happen more than 
one job can be ready to be passed on to the GPU. Stats of number of 
submitted jobs per worker invocation (with this patch):

		1	2	3	4	5
gfx_0.0.0	21315	541	9849	171	0
comp_1.3.0	3093	9	2	0	0
comp_1.1.0	3501	46	2	1	0
comp_1.0.1	3451	46	2	0	0
sdma0		4400	746	279	481	7

This is for userspace contexts only. Quite a good number of three jobs 
submitted per worker invocation.

Kernel sdma appears to favour deeper queues even more but I forgot to 
log above 2 jobs per worker invocation:

	1	2	
sdma0	8009	1913

I can try to measure the latencies of worker re-queue approach. Another 
interesting thing would be C-state residencies and CPU power. But given 
how when the scheduler went from kthread to wq and lost the ability the 
queue more than one job, I don't think back then anyone measured this? 
In which case I suspect we even don't know if some latency or efficiency 
was lost.

Regards,

Tvrtko

>>>> We can simply feed the hardware
>>>> with as much as it can take in one go and hopefully win some latency.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Cc: Danilo Krummrich <dakr@kernel.org>
>>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>>> Cc: Philipp Stanner <phasta@kernel.org>
>>>> ---
>>>>    drivers/gpu/drm/scheduler/sched_internal.h |   2 -
>>>>    drivers/gpu/drm/scheduler/sched_main.c     | 132 ++++++++++-----------
>>>>    drivers/gpu/drm/scheduler/sched_rq.c       |  12 +-
>>>>    3 files changed, 64 insertions(+), 82 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
>>>> index 15d78abc48df..1a5c2f255223 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_internal.h
>>>> +++ b/drivers/gpu/drm/scheduler/sched_internal.h
>>>> @@ -22,8 +22,6 @@ struct drm_sched_entity_stats {
>>>>        u64        vruntime;
>>>>    };
>>>>    -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
>>>> -             struct drm_sched_entity *entity);
>>>>    void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>>>>      void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>> index 35025edea669..1fb3f1da4821 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>> @@ -95,35 +95,6 @@ static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
>>>>        return credits;
>>>>    }
>>>>    -/**
>>>> - * drm_sched_can_queue -- Can we queue more to the hardware?
>>>> - * @sched: scheduler instance
>>>> - * @entity: the scheduler entity
>>>> - *
>>>> - * Return true if we can push at least one more job from @entity, false
>>>> - * otherwise.
>>>> - */
>>>> -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
>>>> -             struct drm_sched_entity *entity)
>>>> -{
>>>> -    struct drm_sched_job *s_job;
>>>> -
>>>> -    s_job = drm_sched_entity_queue_peek(entity);
>>>> -    if (!s_job)
>>>> -        return false;
>>>> -
>>>> -    /* If a job exceeds the credit limit, truncate it to the credit limit
>>>> -     * itself to guarantee forward progress.
>>>> -     */
>>>> -    if (s_job->credits > sched->credit_limit) {
>>>> -        dev_WARN(sched->dev,
>>>> -             "Jobs may not exceed the credit limit, truncate.\n");
>>>> -        s_job->credits = sched->credit_limit;
>>>> -    }
>>>> -
>>>> -    return drm_sched_available_credits(sched) >= s_job->credits;
>>>> -}
>>>> -
>>>>    /**
>>>>     * drm_sched_run_job_queue - enqueue run-job work
>>>>     * @sched: scheduler instance
>>>> @@ -940,54 +911,77 @@ static void drm_sched_run_job_work(struct work_struct *w)
>>>>    {
>>>>        struct drm_gpu_scheduler *sched =
>>>>            container_of(w, struct drm_gpu_scheduler, work_run_job);
>>>> +    u32 job_credits, submitted_credits = 0;
>>>>        struct drm_sched_entity *entity;
>>>> -    struct dma_fence *fence;
>>>>        struct drm_sched_fence *s_fence;
>>>>        struct drm_sched_job *sched_job;
>>>> -    int r;
>>>> +    struct dma_fence *fence;
>>>>    -    /* Find entity with a ready job */
>>>> -    entity = drm_sched_rq_select_entity(sched, sched->rq);
>>>> -    if (IS_ERR_OR_NULL(entity))
>>>> -        return;    /* No more work */
>>>> +    while (!READ_ONCE(sched->pause_submit)) {
>>>> +        /* Find entity with a ready job */
>>>> +        entity = drm_sched_rq_select_entity(sched, sched->rq);
>>>> +        if (!entity)
>>>> +            break;    /* No more work */
>>>> +
>>>> +        sched_job = drm_sched_entity_queue_peek(entity);
>>>> +        if (!sched_job) {
>>>> +            complete_all(&entity->entity_idle);
>>>> +            continue;
>>>> +        }
>>>> +
>>>> +        job_credits = sched_job->credits;
>>>> +        /*
>>>> +         * If a job exceeds the credit limit truncate it to guarantee
>>>> +         * forward progress.
>>>> +         */
>>>> +        if (dev_WARN_ONCE(sched->dev, job_credits > sched->credit_limit,
>>>> +                  "Jobs may not exceed the credit limit, truncating.\n"))
>>>> +            job_credits = sched_job->credits = sched->credit_limit;
>>>> +
>>>> +        if (job_credits > drm_sched_available_credits(sched)) {
>>>> +            complete_all(&entity->entity_idle);
>>>> +            break;
>>>> +        }
>>>> +
>>>> +        sched_job = drm_sched_entity_pop_job(entity);
>>>> +        if (!sched_job) {
>>>> +            /* Top entity is not yet runnable after all */
>>>> +            complete_all(&entity->entity_idle);
>>>> +            continue;
>>>> +        }
>>>> +
>>>> +        s_fence = sched_job->s_fence;
>>>> +        drm_sched_job_begin(sched_job);
>>>> +        trace_drm_sched_job_run(sched_job, entity);
>>>> +        submitted_credits += job_credits;
>>>> +        atomic_add(job_credits, &sched->credit_count);
>>>> +
>>>> +        fence = sched->ops->run_job(sched_job);
>>>> +        drm_sched_fence_scheduled(s_fence, fence);
>>>> +
>>>> +        if (!IS_ERR_OR_NULL(fence)) {
>>>> +            int r;
>>>> +
>>>> +            /* Drop for original kref_init of the fence */
>>>> +            dma_fence_put(fence);
>>>> +
>>>> +            r = dma_fence_add_callback(fence, &sched_job->cb,
>>>> +                           drm_sched_job_done_cb);
>>>> +            if (r == -ENOENT)
>>>> +                drm_sched_job_done(sched_job, fence->error);
>>>> +            else if (r)
>>>> +                DRM_DEV_ERROR(sched->dev,
>>>> +                          "fence add callback failed (%d)\n", r);
>>>> +        } else {
>>>> +            drm_sched_job_done(sched_job, IS_ERR(fence) ?
>>>> +                              PTR_ERR(fence) : 0);
>>>> +        }
>>>>    -    sched_job = drm_sched_entity_pop_job(entity);
>>>> -    if (!sched_job) {
>>>>            complete_all(&entity->entity_idle);
>>>> -        drm_sched_run_job_queue(sched);
>>>> -        return;
>>>>        }
>>>>    -    s_fence = sched_job->s_fence;
>>>> -
>>>> -    atomic_add(sched_job->credits, &sched->credit_count);
>>>> -    drm_sched_job_begin(sched_job);
>>>> -
>>>> -    trace_drm_sched_job_run(sched_job, entity);
>>>> -    /*
>>>> -     * The run_job() callback must by definition return a fence whose
>>>> -     * refcount has been incremented for the scheduler already.
>>>> -     */
>>>> -    fence = sched->ops->run_job(sched_job);
>>>> -    complete_all(&entity->entity_idle);
>>>> -    drm_sched_fence_scheduled(s_fence, fence);
>>>> -
>>>> -    if (!IS_ERR_OR_NULL(fence)) {
>>>> -        r = dma_fence_add_callback(fence, &sched_job->cb,
>>>> -                       drm_sched_job_done_cb);
>>>> -        if (r == -ENOENT)
>>>> -            drm_sched_job_done(sched_job, fence->error);
>>>> -        else if (r)
>>>> -            DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r);
>>>> -
>>>> -        dma_fence_put(fence);
>>>> -    } else {
>>>> -        drm_sched_job_done(sched_job, IS_ERR(fence) ?
>>>> -                   PTR_ERR(fence) : 0);
>>>> -    }
>>>> -
>>>> -    wake_up(&sched->job_scheduled);
>>>> -    drm_sched_run_job_queue(sched);
>>>> +    if (submitted_credits)
>>>> +        wake_up(&sched->job_scheduled);
>>>>    }
>>>>      static struct workqueue_struct *drm_sched_alloc_wq(const char *name)
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
>>>> index e22f9ff88822..f0afdc0bd417 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_rq.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_rq.c
>>>> @@ -197,9 +197,7 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
>>>>     *
>>>>     * Find oldest waiting ready entity.
>>>>     *
>>>> - * Return an entity if one is found; return an error-pointer (!NULL) if an
>>>> - * entity was ready, but the scheduler had insufficient credits to accommodate
>>>> - * its job; return NULL, if no ready entity was found.
>>>> + * Return an entity if one is found or NULL if no ready entity was found.
>>>>     */
>>>>    struct drm_sched_entity *
>>>>    drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
>>>> @@ -213,14 +211,6 @@ drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
>>>>              entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
>>>>            if (drm_sched_entity_is_ready(entity)) {
>>>> -            /* If we can't queue yet, preserve the current entity in
>>>> -             * terms of fairness.
>>>> -             */
>>>> -            if (!drm_sched_can_queue(sched, entity)) {
>>>> -                spin_unlock(&rq->lock);
>>>> -                return ERR_PTR(-ENOSPC);
>>>> -            }
>>>> -
>>>>                reinit_completion(&entity->entity_idle);
>>>>                break;
>>>>            }
>>>
>>
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation
  2025-07-08 15:31         ` Tvrtko Ursulin
@ 2025-07-08 18:59           ` Matthew Brost
  2025-07-09  8:57           ` Christian König
  1 sibling, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2025-07-08 18:59 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Christian König, dri-devel, intel-xe, amd-gfx, kernel-dev,
	Danilo Krummrich, Philipp Stanner

On Tue, Jul 08, 2025 at 04:31:31PM +0100, Tvrtko Ursulin wrote:
> 
> On 08/07/2025 14:02, Christian König wrote:
> > On 08.07.25 14:54, Tvrtko Ursulin wrote:
> > > 
> > > On 08/07/2025 13:37, Christian König wrote:
> > > > On 08.07.25 11:51, Tvrtko Ursulin wrote:
> > > > > There is no reason to queue just a single job if scheduler can take more
> > > > > and re-queue the worker to queue more.
> > > > 
> > > > That's not correct. This was intentionally avoided.
> > > > 
> > > > If more than just the scheduler is using the single threaded workqeueu other workers, especially the timeout worker, can jump in and execute first.
> > > > 
> > > > We explicitely removed submitting more than one job in each worker run.
> > > 
> > > I wanted to ask why, but then I had a look to see if anyone actually does this. And I did not find any driver sharing a single threaded workqueue between submit and timeout.
> > > 
> > > The only driver which even passes in the same workqueue for both is PVR, but it is not a single threaded one.
> > > 
> > > Or perhaps I misunderstood what you said. Could you please clarify either way?
> > 
> > You correctly understood that.
> > 
> > The argument was that submitting more than one job in a worker is simply not beneficial and other work items can jump in and execute.
> > 
> > I have no idea if that is actually used or not. You would need to dig up the discussion when we switched from a kernel thread to work items for the full background.
> > 

I think Christian is capturing the gist of the discussion. I originally
had it coded the way Tvrtko did, but got pushback and switched to the
requeue approach. If I recall correctly, at the time the default
workqueue was a system WQ, which we definitely didn’t want to hog.

Now that the default is a dedicated worker, this is less of an issue.
However, technically, a system worker could still be passed in—though it
shouldn't be, since the WQ should be marked with WQ_RECLAIM.

I don’t have a strong opinion either way, so I’m going to stay out of
this one.

Matt

> > But in general to do as less work as possible in each worker and then re-submit it is usually a good idea.
> 
> From the point of view that the single work item invocation shouldn't hog
> the worker, if the worker is shared, I agree. But what we also want is to
> feed the GPU as fast as possible, ie. put the CPU to sleep as quickly as
> possible.
> 
> If we consider drivers with dedicated workqueues per hardware engine, or
> even per userspace context, then especially in those cases I don't see what
> is the benefit of playing the wq re-queue games.
> 
> Anyway, I can park this patch for now, I *think* it will be easy to drop and
> will just need to rebase 15/15 to cope.
> 
> In the meantime I have collected some stats when running Cyberpunk 2077
> benchmark on amdgpu, just to remind myself that it does happen more than one
> job can be ready to be passed on to the GPU. Stats of number of submitted
> jobs per worker invocation (with this patch):
> 
> 		1	2	3	4	5
> gfx_0.0.0	21315	541	9849	171	0
> comp_1.3.0	3093	9	2	0	0
> comp_1.1.0	3501	46	2	1	0
> comp_1.0.1	3451	46	2	0	0
> sdma0		4400	746	279	481	7
> 
> This is for userspace contexts only. Quite a good number of three jobs
> submitted per worker invocation.
> 
> Kernel sdma appears to favour deeper queues even more but I forgot to log
> above 2 jobs per worker invocation:
> 
> 	1	2	
> sdma0	8009	1913
> 
> I can try to measure the latencies of worker re-queue approach. Another
> interesting thing would be C-state residencies and CPU power. But given how
> when the scheduler went from kthread to wq and lost the ability the queue
> more than one job, I don't think back then anyone measured this? In which
> case I suspect we even don't know if some latency or efficiency was lost.
> 
> Regards,
> 
> Tvrtko
> 
> > > > > We can simply feed the hardware
> > > > > with as much as it can take in one go and hopefully win some latency.
> > > > > 
> > > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> > > > > Cc: Christian König <christian.koenig@amd.com>
> > > > > Cc: Danilo Krummrich <dakr@kernel.org>
> > > > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > > > Cc: Philipp Stanner <phasta@kernel.org>
> > > > > ---
> > > > >    drivers/gpu/drm/scheduler/sched_internal.h |   2 -
> > > > >    drivers/gpu/drm/scheduler/sched_main.c     | 132 ++++++++++-----------
> > > > >    drivers/gpu/drm/scheduler/sched_rq.c       |  12 +-
> > > > >    3 files changed, 64 insertions(+), 82 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
> > > > > index 15d78abc48df..1a5c2f255223 100644
> > > > > --- a/drivers/gpu/drm/scheduler/sched_internal.h
> > > > > +++ b/drivers/gpu/drm/scheduler/sched_internal.h
> > > > > @@ -22,8 +22,6 @@ struct drm_sched_entity_stats {
> > > > >        u64        vruntime;
> > > > >    };
> > > > >    -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
> > > > > -             struct drm_sched_entity *entity);
> > > > >    void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
> > > > >      void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
> > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > index 35025edea669..1fb3f1da4821 100644
> > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > @@ -95,35 +95,6 @@ static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
> > > > >        return credits;
> > > > >    }
> > > > >    -/**
> > > > > - * drm_sched_can_queue -- Can we queue more to the hardware?
> > > > > - * @sched: scheduler instance
> > > > > - * @entity: the scheduler entity
> > > > > - *
> > > > > - * Return true if we can push at least one more job from @entity, false
> > > > > - * otherwise.
> > > > > - */
> > > > > -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
> > > > > -             struct drm_sched_entity *entity)
> > > > > -{
> > > > > -    struct drm_sched_job *s_job;
> > > > > -
> > > > > -    s_job = drm_sched_entity_queue_peek(entity);
> > > > > -    if (!s_job)
> > > > > -        return false;
> > > > > -
> > > > > -    /* If a job exceeds the credit limit, truncate it to the credit limit
> > > > > -     * itself to guarantee forward progress.
> > > > > -     */
> > > > > -    if (s_job->credits > sched->credit_limit) {
> > > > > -        dev_WARN(sched->dev,
> > > > > -             "Jobs may not exceed the credit limit, truncate.\n");
> > > > > -        s_job->credits = sched->credit_limit;
> > > > > -    }
> > > > > -
> > > > > -    return drm_sched_available_credits(sched) >= s_job->credits;
> > > > > -}
> > > > > -
> > > > >    /**
> > > > >     * drm_sched_run_job_queue - enqueue run-job work
> > > > >     * @sched: scheduler instance
> > > > > @@ -940,54 +911,77 @@ static void drm_sched_run_job_work(struct work_struct *w)
> > > > >    {
> > > > >        struct drm_gpu_scheduler *sched =
> > > > >            container_of(w, struct drm_gpu_scheduler, work_run_job);
> > > > > +    u32 job_credits, submitted_credits = 0;
> > > > >        struct drm_sched_entity *entity;
> > > > > -    struct dma_fence *fence;
> > > > >        struct drm_sched_fence *s_fence;
> > > > >        struct drm_sched_job *sched_job;
> > > > > -    int r;
> > > > > +    struct dma_fence *fence;
> > > > >    -    /* Find entity with a ready job */
> > > > > -    entity = drm_sched_rq_select_entity(sched, sched->rq);
> > > > > -    if (IS_ERR_OR_NULL(entity))
> > > > > -        return;    /* No more work */
> > > > > +    while (!READ_ONCE(sched->pause_submit)) {
> > > > > +        /* Find entity with a ready job */
> > > > > +        entity = drm_sched_rq_select_entity(sched, sched->rq);
> > > > > +        if (!entity)
> > > > > +            break;    /* No more work */
> > > > > +
> > > > > +        sched_job = drm_sched_entity_queue_peek(entity);
> > > > > +        if (!sched_job) {
> > > > > +            complete_all(&entity->entity_idle);
> > > > > +            continue;
> > > > > +        }
> > > > > +
> > > > > +        job_credits = sched_job->credits;
> > > > > +        /*
> > > > > +         * If a job exceeds the credit limit truncate it to guarantee
> > > > > +         * forward progress.
> > > > > +         */
> > > > > +        if (dev_WARN_ONCE(sched->dev, job_credits > sched->credit_limit,
> > > > > +                  "Jobs may not exceed the credit limit, truncating.\n"))
> > > > > +            job_credits = sched_job->credits = sched->credit_limit;
> > > > > +
> > > > > +        if (job_credits > drm_sched_available_credits(sched)) {
> > > > > +            complete_all(&entity->entity_idle);
> > > > > +            break;
> > > > > +        }
> > > > > +
> > > > > +        sched_job = drm_sched_entity_pop_job(entity);
> > > > > +        if (!sched_job) {
> > > > > +            /* Top entity is not yet runnable after all */
> > > > > +            complete_all(&entity->entity_idle);
> > > > > +            continue;
> > > > > +        }
> > > > > +
> > > > > +        s_fence = sched_job->s_fence;
> > > > > +        drm_sched_job_begin(sched_job);
> > > > > +        trace_drm_sched_job_run(sched_job, entity);
> > > > > +        submitted_credits += job_credits;
> > > > > +        atomic_add(job_credits, &sched->credit_count);
> > > > > +
> > > > > +        fence = sched->ops->run_job(sched_job);
> > > > > +        drm_sched_fence_scheduled(s_fence, fence);
> > > > > +
> > > > > +        if (!IS_ERR_OR_NULL(fence)) {
> > > > > +            int r;
> > > > > +
> > > > > +            /* Drop for original kref_init of the fence */
> > > > > +            dma_fence_put(fence);
> > > > > +
> > > > > +            r = dma_fence_add_callback(fence, &sched_job->cb,
> > > > > +                           drm_sched_job_done_cb);
> > > > > +            if (r == -ENOENT)
> > > > > +                drm_sched_job_done(sched_job, fence->error);
> > > > > +            else if (r)
> > > > > +                DRM_DEV_ERROR(sched->dev,
> > > > > +                          "fence add callback failed (%d)\n", r);
> > > > > +        } else {
> > > > > +            drm_sched_job_done(sched_job, IS_ERR(fence) ?
> > > > > +                              PTR_ERR(fence) : 0);
> > > > > +        }
> > > > >    -    sched_job = drm_sched_entity_pop_job(entity);
> > > > > -    if (!sched_job) {
> > > > >            complete_all(&entity->entity_idle);
> > > > > -        drm_sched_run_job_queue(sched);
> > > > > -        return;
> > > > >        }
> > > > >    -    s_fence = sched_job->s_fence;
> > > > > -
> > > > > -    atomic_add(sched_job->credits, &sched->credit_count);
> > > > > -    drm_sched_job_begin(sched_job);
> > > > > -
> > > > > -    trace_drm_sched_job_run(sched_job, entity);
> > > > > -    /*
> > > > > -     * The run_job() callback must by definition return a fence whose
> > > > > -     * refcount has been incremented for the scheduler already.
> > > > > -     */
> > > > > -    fence = sched->ops->run_job(sched_job);
> > > > > -    complete_all(&entity->entity_idle);
> > > > > -    drm_sched_fence_scheduled(s_fence, fence);
> > > > > -
> > > > > -    if (!IS_ERR_OR_NULL(fence)) {
> > > > > -        r = dma_fence_add_callback(fence, &sched_job->cb,
> > > > > -                       drm_sched_job_done_cb);
> > > > > -        if (r == -ENOENT)
> > > > > -            drm_sched_job_done(sched_job, fence->error);
> > > > > -        else if (r)
> > > > > -            DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r);
> > > > > -
> > > > > -        dma_fence_put(fence);
> > > > > -    } else {
> > > > > -        drm_sched_job_done(sched_job, IS_ERR(fence) ?
> > > > > -                   PTR_ERR(fence) : 0);
> > > > > -    }
> > > > > -
> > > > > -    wake_up(&sched->job_scheduled);
> > > > > -    drm_sched_run_job_queue(sched);
> > > > > +    if (submitted_credits)
> > > > > +        wake_up(&sched->job_scheduled);
> > > > >    }
> > > > >      static struct workqueue_struct *drm_sched_alloc_wq(const char *name)
> > > > > diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
> > > > > index e22f9ff88822..f0afdc0bd417 100644
> > > > > --- a/drivers/gpu/drm/scheduler/sched_rq.c
> > > > > +++ b/drivers/gpu/drm/scheduler/sched_rq.c
> > > > > @@ -197,9 +197,7 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
> > > > >     *
> > > > >     * Find oldest waiting ready entity.
> > > > >     *
> > > > > - * Return an entity if one is found; return an error-pointer (!NULL) if an
> > > > > - * entity was ready, but the scheduler had insufficient credits to accommodate
> > > > > - * its job; return NULL, if no ready entity was found.
> > > > > + * Return an entity if one is found or NULL if no ready entity was found.
> > > > >     */
> > > > >    struct drm_sched_entity *
> > > > >    drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
> > > > > @@ -213,14 +211,6 @@ drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
> > > > >              entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
> > > > >            if (drm_sched_entity_is_ready(entity)) {
> > > > > -            /* If we can't queue yet, preserve the current entity in
> > > > > -             * terms of fairness.
> > > > > -             */
> > > > > -            if (!drm_sched_can_queue(sched, entity)) {
> > > > > -                spin_unlock(&rq->lock);
> > > > > -                return ERR_PTR(-ENOSPC);
> > > > > -            }
> > > > > -
> > > > >                reinit_completion(&entity->entity_idle);
> > > > >                break;
> > > > >            }
> > > > 
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation
  2025-07-08 15:31         ` Tvrtko Ursulin
  2025-07-08 18:59           ` Matthew Brost
@ 2025-07-09  8:57           ` Christian König
  1 sibling, 0 replies; 31+ messages in thread
From: Christian König @ 2025-07-09  8:57 UTC (permalink / raw)
  To: Tvrtko Ursulin, dri-devel
  Cc: intel-xe, amd-gfx, kernel-dev, Danilo Krummrich, Matthew Brost,
	Philipp Stanner

On 08.07.25 17:31, Tvrtko Ursulin wrote:
> 
> On 08/07/2025 14:02, Christian König wrote:
>> On 08.07.25 14:54, Tvrtko Ursulin wrote:
>>>
>>> On 08/07/2025 13:37, Christian König wrote:
>>>> On 08.07.25 11:51, Tvrtko Ursulin wrote:
>>>>> There is no reason to queue just a single job if scheduler can take more
>>>>> and re-queue the worker to queue more.
>>>>
>>>> That's not correct. This was intentionally avoided.
>>>>
>>>> If more than just the scheduler is using the single threaded workqeueu other workers, especially the timeout worker, can jump in and execute first.
>>>>
>>>> We explicitely removed submitting more than one job in each worker run.
>>>
>>> I wanted to ask why, but then I had a look to see if anyone actually does this. And I did not find any driver sharing a single threaded workqueue between submit and timeout.
>>>
>>> The only driver which even passes in the same workqueue for both is PVR, but it is not a single threaded one.
>>>
>>> Or perhaps I misunderstood what you said. Could you please clarify either way?
>>
>> You correctly understood that.
>>
>> The argument was that submitting more than one job in a worker is simply not beneficial and other work items can jump in and execute.
>>
>> I have no idea if that is actually used or not. You would need to dig up the discussion when we switched from a kernel thread to work items for the full background.
>>
>> But in general to do as less work as possible in each worker and then re-submit it is usually a good idea.
> 
> From the point of view that the single work item invocation shouldn't hog the worker, if the worker is shared, I agree.


> But what we also want is to feed the GPU as fast as possible, ie. put the CPU to sleep as quickly as possible.

Yeah, that I clearly agree with.

> If we consider drivers with dedicated workqueues per hardware engine, or even per userspace context, then especially in those cases I don't see what is the benefit of playing the wq re-queue games.
> 
> Anyway, I can park this patch for now, I *think* it will be easy to drop and will just need to rebase 15/15 to cope.
> 
> In the meantime I have collected some stats when running Cyberpunk 2077 benchmark on amdgpu, just to remind myself that it does happen more than one job can be ready to be passed on to the GPU. Stats of number of submitted jobs per worker invocation (with this patch):
> 
>         1    2    3    4    5
> gfx_0.0.0    21315    541    9849    171    0
> comp_1.3.0    3093    9    2    0    0
> comp_1.1.0    3501    46    2    1    0
> comp_1.0.1    3451    46    2    0    0
> sdma0        4400    746    279    481    7
> 
> This is for userspace contexts only. Quite a good number of three jobs submitted per worker invocation.
> 
> Kernel sdma appears to favour deeper queues even more but I forgot to log above 2 jobs per worker invocation:
> 
>     1    2   
> sdma0    8009    1913
> 
> I can try to measure the latencies of worker re-queue approach. Another interesting thing would be C-state residencies and CPU power. But given how when the scheduler went from kthread to wq and lost the ability the queue more than one job, I don't think back then anyone measured this? In which case I suspect we even don't know if some latency or efficiency was lost.

It's been a while, but I do remember that we have gone back and forth over this when we switched from kthread to work item. I'm honestly not sure how much overhead a work item actually has, but I guess it isn't that much.

But I'm pretty sure that we burn quite a number of CPU cycles in other places, e.g. the VMID management for example. It would certainly be nice if we could profile that and see where most CPU cycles are spend.

Regards,
Christian.

> 
> Regards,
> 
> Tvrtko
> 
>>>>> We can simply feed the hardware
>>>>> with as much as it can take in one go and hopefully win some latency.
>>>>>
>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>> Cc: Danilo Krummrich <dakr@kernel.org>
>>>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>>>> Cc: Philipp Stanner <phasta@kernel.org>
>>>>> ---
>>>>>    drivers/gpu/drm/scheduler/sched_internal.h |   2 -
>>>>>    drivers/gpu/drm/scheduler/sched_main.c     | 132 ++++++++++-----------
>>>>>    drivers/gpu/drm/scheduler/sched_rq.c       |  12 +-
>>>>>    3 files changed, 64 insertions(+), 82 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
>>>>> index 15d78abc48df..1a5c2f255223 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_internal.h
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_internal.h
>>>>> @@ -22,8 +22,6 @@ struct drm_sched_entity_stats {
>>>>>        u64        vruntime;
>>>>>    };
>>>>>    -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
>>>>> -             struct drm_sched_entity *entity);
>>>>>    void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>>>>>      void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index 35025edea669..1fb3f1da4821 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -95,35 +95,6 @@ static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
>>>>>        return credits;
>>>>>    }
>>>>>    -/**
>>>>> - * drm_sched_can_queue -- Can we queue more to the hardware?
>>>>> - * @sched: scheduler instance
>>>>> - * @entity: the scheduler entity
>>>>> - *
>>>>> - * Return true if we can push at least one more job from @entity, false
>>>>> - * otherwise.
>>>>> - */
>>>>> -bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
>>>>> -             struct drm_sched_entity *entity)
>>>>> -{
>>>>> -    struct drm_sched_job *s_job;
>>>>> -
>>>>> -    s_job = drm_sched_entity_queue_peek(entity);
>>>>> -    if (!s_job)
>>>>> -        return false;
>>>>> -
>>>>> -    /* If a job exceeds the credit limit, truncate it to the credit limit
>>>>> -     * itself to guarantee forward progress.
>>>>> -     */
>>>>> -    if (s_job->credits > sched->credit_limit) {
>>>>> -        dev_WARN(sched->dev,
>>>>> -             "Jobs may not exceed the credit limit, truncate.\n");
>>>>> -        s_job->credits = sched->credit_limit;
>>>>> -    }
>>>>> -
>>>>> -    return drm_sched_available_credits(sched) >= s_job->credits;
>>>>> -}
>>>>> -
>>>>>    /**
>>>>>     * drm_sched_run_job_queue - enqueue run-job work
>>>>>     * @sched: scheduler instance
>>>>> @@ -940,54 +911,77 @@ static void drm_sched_run_job_work(struct work_struct *w)
>>>>>    {
>>>>>        struct drm_gpu_scheduler *sched =
>>>>>            container_of(w, struct drm_gpu_scheduler, work_run_job);
>>>>> +    u32 job_credits, submitted_credits = 0;
>>>>>        struct drm_sched_entity *entity;
>>>>> -    struct dma_fence *fence;
>>>>>        struct drm_sched_fence *s_fence;
>>>>>        struct drm_sched_job *sched_job;
>>>>> -    int r;
>>>>> +    struct dma_fence *fence;
>>>>>    -    /* Find entity with a ready job */
>>>>> -    entity = drm_sched_rq_select_entity(sched, sched->rq);
>>>>> -    if (IS_ERR_OR_NULL(entity))
>>>>> -        return;    /* No more work */
>>>>> +    while (!READ_ONCE(sched->pause_submit)) {
>>>>> +        /* Find entity with a ready job */
>>>>> +        entity = drm_sched_rq_select_entity(sched, sched->rq);
>>>>> +        if (!entity)
>>>>> +            break;    /* No more work */
>>>>> +
>>>>> +        sched_job = drm_sched_entity_queue_peek(entity);
>>>>> +        if (!sched_job) {
>>>>> +            complete_all(&entity->entity_idle);
>>>>> +            continue;
>>>>> +        }
>>>>> +
>>>>> +        job_credits = sched_job->credits;
>>>>> +        /*
>>>>> +         * If a job exceeds the credit limit truncate it to guarantee
>>>>> +         * forward progress.
>>>>> +         */
>>>>> +        if (dev_WARN_ONCE(sched->dev, job_credits > sched->credit_limit,
>>>>> +                  "Jobs may not exceed the credit limit, truncating.\n"))
>>>>> +            job_credits = sched_job->credits = sched->credit_limit;
>>>>> +
>>>>> +        if (job_credits > drm_sched_available_credits(sched)) {
>>>>> +            complete_all(&entity->entity_idle);
>>>>> +            break;
>>>>> +        }
>>>>> +
>>>>> +        sched_job = drm_sched_entity_pop_job(entity);
>>>>> +        if (!sched_job) {
>>>>> +            /* Top entity is not yet runnable after all */
>>>>> +            complete_all(&entity->entity_idle);
>>>>> +            continue;
>>>>> +        }
>>>>> +
>>>>> +        s_fence = sched_job->s_fence;
>>>>> +        drm_sched_job_begin(sched_job);
>>>>> +        trace_drm_sched_job_run(sched_job, entity);
>>>>> +        submitted_credits += job_credits;
>>>>> +        atomic_add(job_credits, &sched->credit_count);
>>>>> +
>>>>> +        fence = sched->ops->run_job(sched_job);
>>>>> +        drm_sched_fence_scheduled(s_fence, fence);
>>>>> +
>>>>> +        if (!IS_ERR_OR_NULL(fence)) {
>>>>> +            int r;
>>>>> +
>>>>> +            /* Drop for original kref_init of the fence */
>>>>> +            dma_fence_put(fence);
>>>>> +
>>>>> +            r = dma_fence_add_callback(fence, &sched_job->cb,
>>>>> +                           drm_sched_job_done_cb);
>>>>> +            if (r == -ENOENT)
>>>>> +                drm_sched_job_done(sched_job, fence->error);
>>>>> +            else if (r)
>>>>> +                DRM_DEV_ERROR(sched->dev,
>>>>> +                          "fence add callback failed (%d)\n", r);
>>>>> +        } else {
>>>>> +            drm_sched_job_done(sched_job, IS_ERR(fence) ?
>>>>> +                              PTR_ERR(fence) : 0);
>>>>> +        }
>>>>>    -    sched_job = drm_sched_entity_pop_job(entity);
>>>>> -    if (!sched_job) {
>>>>>            complete_all(&entity->entity_idle);
>>>>> -        drm_sched_run_job_queue(sched);
>>>>> -        return;
>>>>>        }
>>>>>    -    s_fence = sched_job->s_fence;
>>>>> -
>>>>> -    atomic_add(sched_job->credits, &sched->credit_count);
>>>>> -    drm_sched_job_begin(sched_job);
>>>>> -
>>>>> -    trace_drm_sched_job_run(sched_job, entity);
>>>>> -    /*
>>>>> -     * The run_job() callback must by definition return a fence whose
>>>>> -     * refcount has been incremented for the scheduler already.
>>>>> -     */
>>>>> -    fence = sched->ops->run_job(sched_job);
>>>>> -    complete_all(&entity->entity_idle);
>>>>> -    drm_sched_fence_scheduled(s_fence, fence);
>>>>> -
>>>>> -    if (!IS_ERR_OR_NULL(fence)) {
>>>>> -        r = dma_fence_add_callback(fence, &sched_job->cb,
>>>>> -                       drm_sched_job_done_cb);
>>>>> -        if (r == -ENOENT)
>>>>> -            drm_sched_job_done(sched_job, fence->error);
>>>>> -        else if (r)
>>>>> -            DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r);
>>>>> -
>>>>> -        dma_fence_put(fence);
>>>>> -    } else {
>>>>> -        drm_sched_job_done(sched_job, IS_ERR(fence) ?
>>>>> -                   PTR_ERR(fence) : 0);
>>>>> -    }
>>>>> -
>>>>> -    wake_up(&sched->job_scheduled);
>>>>> -    drm_sched_run_job_queue(sched);
>>>>> +    if (submitted_credits)
>>>>> +        wake_up(&sched->job_scheduled);
>>>>>    }
>>>>>      static struct workqueue_struct *drm_sched_alloc_wq(const char *name)
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
>>>>> index e22f9ff88822..f0afdc0bd417 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_rq.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_rq.c
>>>>> @@ -197,9 +197,7 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
>>>>>     *
>>>>>     * Find oldest waiting ready entity.
>>>>>     *
>>>>> - * Return an entity if one is found; return an error-pointer (!NULL) if an
>>>>> - * entity was ready, but the scheduler had insufficient credits to accommodate
>>>>> - * its job; return NULL, if no ready entity was found.
>>>>> + * Return an entity if one is found or NULL if no ready entity was found.
>>>>>     */
>>>>>    struct drm_sched_entity *
>>>>>    drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
>>>>> @@ -213,14 +211,6 @@ drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
>>>>>              entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
>>>>>            if (drm_sched_entity_is_ready(entity)) {
>>>>> -            /* If we can't queue yet, preserve the current entity in
>>>>> -             * terms of fairness.
>>>>> -             */
>>>>> -            if (!drm_sched_can_queue(sched, entity)) {
>>>>> -                spin_unlock(&rq->lock);
>>>>> -                return ERR_PTR(-ENOSPC);
>>>>> -            }
>>>>> -
>>>>>                reinit_completion(&entity->entity_idle);
>>>>>                break;
>>>>>            }
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 01/15] drm/sched: Add some scheduling quality unit tests
  2025-07-08  9:51 ` [PATCH v6 01/15] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
@ 2025-07-09 11:16   ` kernel test robot
  0 siblings, 0 replies; 31+ messages in thread
From: kernel test robot @ 2025-07-09 11:16 UTC (permalink / raw)
  To: Tvrtko Ursulin, dri-devel
  Cc: oe-kbuild-all, intel-xe, amd-gfx, kernel-dev, Tvrtko Ursulin,
	Christian König, Danilo Krummrich, Matthew Brost,
	Philipp Stanner, Pierre-Eric Pelloux-Prayer

Hi Tvrtko,

kernel test robot noticed the following build errors:

[auto build test ERROR on drm-misc/drm-misc-next]
[also build test ERROR on v6.16-rc5 next-20250708]
[cannot apply to drm-exynos/exynos-drm-next linus/master drm/drm-next drm-intel/for-linux-next drm-intel/for-linux-next-fixes drm-tip/drm-tip]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Tvrtko-Ursulin/drm-sched-Add-some-scheduling-quality-unit-tests/20250708-180519
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20250708095147.73366-2-tvrtko.ursulin%40igalia.com
patch subject: [PATCH v6 01/15] drm/sched: Add some scheduling quality unit tests
config: powerpc-randconfig-001-20250709 (https://download.01.org/0day-ci/archive/20250709/202507091842.CV3bHmEt-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250709/202507091842.CV3bHmEt-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507091842.CV3bHmEt-lkp@intel.com/

All errors (new ones prefixed by >>):

   powerpc-linux-ld: drivers/gpu/drm/scheduler/tests/tests_scheduler.o: in function `drm_sched_client_work':
>> drivers/gpu/drm/scheduler/tests/tests_scheduler.c:419: undefined reference to `__divdi3'


vim +419 drivers/gpu/drm/scheduler/tests/tests_scheduler.c

   409	
   410	static void drm_sched_client_work(struct kthread_work *work)
   411	{
   412		struct test_client *client = container_of(work, typeof(*client), work);
   413		const long sync_wait = MAX_SCHEDULE_TIMEOUT;
   414		unsigned int cycle, work_us, period_us;
   415		struct drm_mock_sched_job *job = NULL;
   416	
   417		work_us = client->params.job_cnt * client->params.job_us;
   418		period_us = work_us + client->params.wait_us;
 > 419		client->cycles = DIV_ROUND_UP(ktime_to_us(client->duration), period_us);
   420		client->ideal_duration = us_to_ktime(client->cycles * period_us);
   421	
   422		client->start = ktime_get();
   423	
   424		for (cycle = 0; cycle < client->cycles; cycle++) {
   425			unsigned int batch;
   426			unsigned long us;
   427			ktime_t t;
   428	
   429			if (READ_ONCE(client->done))
   430				break;
   431	
   432			t = ktime_get();
   433			for (batch = 0; batch < client->params.job_cnt; batch++) {
   434				job = drm_mock_sched_job_new(client->test,
   435							     client->entity);
   436				drm_mock_sched_job_set_duration_us(job,
   437								   client->params.job_us);
   438				drm_mock_sched_job_submit(job);
   439			}
   440	
   441			if (client->params.sync)
   442				drm_mock_sched_job_wait_finished(job, sync_wait);
   443	
   444			t = ktime_sub(ktime_get(), t);
   445			us = ktime_to_us(t);
   446			update_stats(&client->cycle_time, cycle, us);
   447			if (ktime_to_us(t) >= (long)work_us)
   448				us = ktime_to_us(t) - work_us;
   449			else if (WARN_ON_ONCE(client->params.sync))
   450				us = 0;
   451			update_stats(&client->latency_time, cycle, us);
   452			WRITE_ONCE(client->cycle, cycle);
   453	
   454			if (READ_ONCE(client->done))
   455				break;
   456	
   457			if (client->params.wait_us)
   458				fsleep(client->params.wait_us);
   459			else
   460				cond_resched();
   461		}
   462	
   463		client->done = drm_mock_sched_job_wait_finished(job, sync_wait);
   464		client->end = ktime_get();
   465	}
   466	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2025-07-09 11:16 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-08  9:51 [PATCH v6 00/15] Fair DRM scheduler Tvrtko Ursulin
2025-07-08  9:51 ` [PATCH v6 01/15] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
2025-07-09 11:16   ` kernel test robot
2025-07-08  9:51 ` [PATCH v6 02/15] drm/sched: Add some more " Tvrtko Ursulin
2025-07-08  9:51 ` [PATCH v6 03/15] drm/sched: Avoid double re-lock on the job free path Tvrtko Ursulin
2025-07-08 11:22   ` Philipp Stanner
2025-07-08 12:23     ` Tvrtko Ursulin
2025-07-08  9:51 ` [PATCH v6 04/15] drm/sched: Consolidate drm_sched_job_timedout Tvrtko Ursulin
2025-07-08  9:51 ` [PATCH v6 05/15] drm/sched: Consolidate drm_sched_rq_select_entity_rr Tvrtko Ursulin
2025-07-08 11:31   ` Philipp Stanner
2025-07-08 12:21     ` Danilo Krummrich
2025-07-08 12:23     ` Tvrtko Ursulin
2025-07-08  9:51 ` [PATCH v6 06/15] drm/sched: Implement RR via FIFO Tvrtko Ursulin
2025-07-08  9:51 ` [PATCH v6 07/15] drm/sched: Consolidate entity run queue management Tvrtko Ursulin
2025-07-08  9:51 ` [PATCH v6 08/15] drm/sched: Move run queue related code into a separate file Tvrtko Ursulin
2025-07-08  9:51 ` [PATCH v6 09/15] drm/sched: Free all finished jobs at once Tvrtko Ursulin
2025-07-08  9:51 ` [PATCH v6 10/15] drm/sched: Account entity GPU time Tvrtko Ursulin
2025-07-08  9:51 ` [PATCH v6 11/15] drm/sched: Remove idle entity from tree Tvrtko Ursulin
2025-07-08  9:51 ` [PATCH v6 12/15] drm/sched: Add fair scheduling policy Tvrtko Ursulin
2025-07-08  9:51 ` [PATCH v6 13/15] drm/sched: Remove FIFO and RR and simplify to a single run queue Tvrtko Ursulin
2025-07-08  9:51 ` [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation Tvrtko Ursulin
2025-07-08 12:19   ` Philipp Stanner
2025-07-08 12:28     ` Tvrtko Ursulin
2025-07-08 12:37   ` Christian König
2025-07-08 12:54     ` Tvrtko Ursulin
2025-07-08 13:02       ` Christian König
2025-07-08 15:31         ` Tvrtko Ursulin
2025-07-08 18:59           ` Matthew Brost
2025-07-09  8:57           ` Christian König
2025-07-08  9:51 ` [PATCH v6 15/15] drm/sched: Embed run queue singleton into the scheduler Tvrtko Ursulin
2025-07-08 10:40 ` [PATCH v6 00/15] Fair DRM scheduler Philipp Stanner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).