* [RFC v4 00/16] Fair DRM scheduler
@ 2025-04-25 10:20 Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 01/16] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
` (17 more replies)
0 siblings, 18 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-25 10:20 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König,
Danilo Krummrich, Leo Liu, Matthew Brost, Philipp Stanner,
Pierre-Eric Pelloux-Prayer, Michel Dänzer
V4 is quite different from v3 in that I have replaced the deadline + queue-depth
approach with a fair GPU time based approach. This is because Pierre-Eric found
a viewperf workload which showed queue-depth based approach regressing and
without it there was a regression on one of my synthetic workloads I was not
happy with.
In my experiments the fair scheduler looks solid so lets see how it fares after
wider testing.
On the high level main advantages of the series are:
1. Scheduling quality - schedules better than FIFO.
2. Code simplification - no more multiple run queues.
First patches add some unit tests which allow for easy evaluation of scheduling
behaviour against different client submission patterns. From there onwards it is
hopefully a natural progression of cleanups, enablers, adding the fair policy,
and finally removing FIFO and RR and simplifying the code base due not more need
for multiple run queues.
As a headline result I have tested three simultaneous clients on the Steam Deck:
One instance of a deferredmultisampling Vulkan demo running with low priority,
one normal priority instance of the same demo, and the Unigine Heaven benchmark.
With the FIFO scheduler we can see that the low priority client is completely
starved and the GPU time distribution between the other two clients is uneven:
https://people.igalia.com/tursulin/drm-sched-fair/fifo-starvation.png
Switching to the fair scheduler, GPU time distribution is almost equal and the
low priority client does get a small share of the GPU:
https://people.igalia.com/tursulin/drm-sched-fair/fair-no-starvation.png
Moving onto the synthetic submission patterns, they are about two simultaneous
clients which broadly cover the following categories:
* Deep queue clients
* Hogs versus interactive
* Priority handling
Lets look at the results:
1. Two normal priority deep queue clients.
These ones submit one second worth of 8ms jobs. As fast as they can, no
dependencies etc. There is no difference in runtime between FIFO and fair but
the latter allows both clients to progress with work more evenly:
https://people.igalia.com/tursulin/drm-sched-fair/normal-normal.png
(X axis is time, Y is submitted queue-depth, hence lowering of qd corresponds
with work progress for both clients, tested with both schedulers separately.)
2. Same two clients but one is now low priority.
https://people.igalia.com/tursulin/drm-sched-fair/normal-low.png
Normal priority client is a solid line, low priority dotted. We can see how FIFO
completely starves the low priority client until the normal priority is fully
done. Only then the low priority client gets any GPU time.
In constrast, fair scheduler allows some GPU time to the low priority client.
3. Same clients but now high versus normal priority.
Similar behaviour as in the previous one with normal a bit less de-prioritised
relative to high, than low was against normal.
https://people.igalia.com/tursulin/drm-sched-fair/high-normal.png
4. Heavy load vs interactive client.
Heavy client emits a 75% GPU load in the format of 3x 2.5ms jobs followed by a
2.5ms wait. Interactive client emits a 10% GPU load in the format of 1x 1ms job
followed by a 9ms wait.
This simulates an interactive graphical client used on top of a relatively heavy
background load but no GPU oversubscription.
Graphs show the interactive client only and from now on, instead of looking at
the client's queue depth, we look at its "fps".
https://people.igalia.com/tursulin/drm-sched-fair/heavy-interactive.png
We can see that fair scheduler allows a higher fps for the interactive client
which is good.
5. An even heavier load vs interactive client.
This one is oversubscribing the GPU by submitting 4x 50ms jobs and waiting for
only one microsecond before repeating the cycle. Interactive client is thje same
10% as above.
https://people.igalia.com/tursulin/drm-sched-fair/veryheavy-interactive.png
Here the difference is even more dramatic with fair scheduler enabling ~3x the
framerate for the interactive client.
6. Low priority GPU hog versus heavy-interactive.
Low priority client: 3x 2.5ms jobs client followed by a 0.5ms wait.
Interactive client: 1x 0.5ms job followed by a 10ms wait.
https://people.igalia.com/tursulin/drm-sched-fair/lowhog-interactive.png
Slight win for the fair scheduler but could be just noise.
7. Last set of test scenarios will have three subgroups.
In all cases we have two interactive (synchronous, single job at a time) clients
with a 50% "duty cycle" GPU time usage.
Client 1: 1.5ms job + 1.5ms wait (aka short bursty)
Client 2: 2.5ms job + 2.5ms wait (aka long bursty)
a) Both normal priority.
https://people.igalia.com/tursulin/drm-sched-fair/5050-short.png
https://people.igalia.com/tursulin/drm-sched-fair/5050-long.png
Both schedulers favour the higher frequency duty cycle with fair giving it a
little bit more which should be good for interactivity.
b) Normal vs low priority.
https://people.igalia.com/tursulin/drm-sched-fair/5050-normal-low-normal.png
https://people.igalia.com/tursulin/drm-sched-fair/5050-normal-low-low.png
Fair scheduler gives a bit more GPU time to the normal priority client which is
again good.
c) High vs normal priority.
https://people.igalia.com/tursulin/drm-sched-fair/5050-high-normal-high.png
https://people.igalia.com/tursulin/drm-sched-fair/5050-high-normal-normal.png
Again, fair scheduler gives a bit more share to the higher priority client.
On the overall fair looks like a potential improvement in terms of fairness,
especially avoiding priority starvation. There do not appear to be any
regressions with the tested workloads.
As before, I am looking for feedback, ideas for what kind of submission
scenarios to test. Testers on different GPUs would be very welcome too.
And I should probably test round-robin at some point, to see if we are maybe
okay to drop unconditionally, it or further work improving fair would be needed
if some use cases rely on round-robin.
v2:
* Fixed many rebase errors.
* Added some new patches.
* Dropped single shot dependecy handling.
v3:
* Added scheduling quality unit tests.
* Refined a tiny bit by adding some fairness.
* Dropped a few patches for now.
v4:
* Replaced deadline with fair!
* Refined scheduling quality unit tests.
* Pulled one cleanup patch earlier.
* Fixed "drm/sched: Avoid double re-lock on the job free path".
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
CC: Leo Liu <Leo.Liu@amd.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Michel Dänzer <michel.daenzer@mailbox.org>
Tvrtko Ursulin (16):
drm/sched: Add some scheduling quality unit tests
drm/sched: Add some more scheduling quality unit tests
drm/sched: De-clutter drm_sched_init
drm/sched: Avoid double re-lock on the job free path
drm/sched: Consolidate drm_sched_job_timedout
drm/sched: Consolidate drm_sched_rq_select_entity_rr
drm/sched: Implement RR via FIFO
drm/sched: Consolidate entity run queue management
drm/sched: Move run queue related code into a separate file
drm/sched: Free all finished jobs at once
drm/sched: Account entity GPU time
drm/sched: Remove idle entity from tree
drm/sched: Add fair scheduling policy
drm/sched: Remove FIFO and RR and simplify to a single run queue
drm/sched: Queue all free credits in one worker invocation
drm/sched: Embed run queue singleton into the scheduler
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 27 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 5 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 8 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 8 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 8 +-
drivers/gpu/drm/scheduler/Makefile | 2 +-
drivers/gpu/drm/scheduler/sched_entity.c | 121 +--
drivers/gpu/drm/scheduler/sched_fence.c | 2 +-
drivers/gpu/drm/scheduler/sched_internal.h | 114 ++-
drivers/gpu/drm/scheduler/sched_main.c | 570 +++---------
drivers/gpu/drm/scheduler/sched_rq.c | 214 +++++
drivers/gpu/drm/scheduler/tests/Makefile | 3 +-
.../gpu/drm/scheduler/tests/tests_scheduler.c | 815 ++++++++++++++++++
include/drm/gpu_scheduler.h | 23 +-
15 files changed, 1348 insertions(+), 578 deletions(-)
create mode 100644 drivers/gpu/drm/scheduler/sched_rq.c
create mode 100644 drivers/gpu/drm/scheduler/tests/tests_scheduler.c
--
2.48.0
^ permalink raw reply [flat|nested] 35+ messages in thread
* [RFC v4 01/16] drm/sched: Add some scheduling quality unit tests
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
@ 2025-04-25 10:20 ` Tvrtko Ursulin
2025-04-29 15:03 ` Christian König
2025-04-25 10:20 ` [RFC v4 02/16] drm/sched: Add some more " Tvrtko Ursulin
` (16 subsequent siblings)
17 siblings, 1 reply; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-25 10:20 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König,
Danilo Krummrich, Matthew Brost, Philipp Stanner,
Pierre-Eric Pelloux-Prayer
To make evaluating different scheduling policies easier (no need for
external benchmarks) and perfectly repetable, lets add some synthetic
workloads built upon mock scheduler unit test infrastructure.
Focus is on two parallel clients (two threads) submitting different job
patterns and logging their progress and some overall metrics. This is
repeated for both scheduler credit limit 1 and 2.
Example test output:
Normal and low:
pct1 cps1 qd1; pct2 cps2 qd2
+ 0ms: 0 0 0; 0 0 0
+ 104ms: 100 1240 112; 100 1240 125
+ 209ms: 100 0 99; 100 0 125
+ 313ms: 100 0 86; 100 0 125
+ 419ms: 100 0 73; 100 0 125
+ 524ms: 100 0 60; 100 0 125
+ 628ms: 100 0 47; 100 0 125
+ 731ms: 100 0 34; 100 0 125
+ 836ms: 100 0 21; 100 0 125
+ 939ms: 100 0 8; 100 0 125
+ 1043ms: ; 100 0 120
+ 1147ms: ; 100 0 107
+ 1252ms: ; 100 0 94
+ 1355ms: ; 100 0 81
+ 1459ms: ; 100 0 68
+ 1563ms: ; 100 0 55
+ 1667ms: ; 100 0 42
+ 1771ms: ; 100 0 29
+ 1875ms: ; 100 0 16
+ 1979ms: ; 100 0 3
0: prio=normal sync=0 elapsed_ms=1015ms (ideal_ms=1000ms) cycle_time(min,avg,max)=134,222,978 us latency_time(min,avg,max)=134,222,978
us
1: prio=low sync=0 elapsed_ms=2009ms (ideal_ms=1000ms) cycle_time(min,avg,max)=134,215,806 us latency_time(min,avg,max)=134,215,806 us
There we have two clients represented in the two respective columns, with
their progress logged roughly every 100 milliseconds. The metrics are:
- pct - Percentage progress of the job submit part
- cps - Cycles per second
- qd - Queue depth - number of submitted unfinished jobs
The cycles per second metric is inherent to the fact that workload
patterns are a data driven cycling sequence of:
- Submit 1..N jobs
- Wait for Nth job to finish (optional)
- Sleep (optional)
- Repeat from start
In this particular example we have a normal priority and a low priority
clients both spamming the scheduler with 8ms jobs with no sync and no
sleeping. Hence they build a very deep queues and we can see how the low
priority client is completely starved until the normal finishes.
Note that the PCT and CPS metrics are irrelevant for "unsync" clients
since they manage to complete all of their cycles instantenuously.
A different example would be:
Heavy and interactive:
pct1 cps1 qd1; pct2 cps2 qd2
+ 0ms: 0 0 0; 0 0 0
+ 106ms: 5 40 3; 5 40 0
+ 209ms: 9 40 0; 9 40 0
+ 314ms: 14 50 3; 14 50 0
+ 417ms: 18 40 0; 18 40 0
+ 522ms: 23 50 3; 23 50 0
+ 625ms: 27 40 0; 27 40 1
+ 729ms: 32 50 0; 32 50 0
+ 833ms: 36 40 1; 36 40 0
+ 937ms: 40 40 0; 40 40 0
+ 1041ms: 45 50 0; 45 50 0
+ 1146ms: 49 40 1; 49 40 1
+ 1249ms: 54 50 0; 54 50 0
+ 1353ms: 58 40 1; 58 40 0
+ 1457ms: 62 40 0; 62 40 1
+ 1561ms: 67 50 0; 67 50 0
+ 1665ms: 71 40 1; 71 40 0
+ 1772ms: 76 50 0; 76 50 0
+ 1877ms: 80 40 1; 80 40 0
+ 1981ms: 84 40 0; 84 40 0
+ 2085ms: 89 50 0; 89 50 0
+ 2189ms: 93 40 1; 93 40 0
+ 2293ms: 97 40 0; 97 40 1
In this case client one is submitting 3x 2.5ms jobs, waiting for the 3rd
and then sleeping for 2.5ms (in effect causing 75% GPU load, minus the
overheads). Second client is submitting 1ms jobs, waiting for each to
finish and sleeping for 9ms (effective 10% GPU load). Here we can see
the PCT and CPS reflecting real progress.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
drivers/gpu/drm/scheduler/tests/Makefile | 3 +-
.../gpu/drm/scheduler/tests/tests_scheduler.c | 631 ++++++++++++++++++
2 files changed, 633 insertions(+), 1 deletion(-)
create mode 100644 drivers/gpu/drm/scheduler/tests/tests_scheduler.c
diff --git a/drivers/gpu/drm/scheduler/tests/Makefile b/drivers/gpu/drm/scheduler/tests/Makefile
index 5bf707bad373..9ec185fbbc15 100644
--- a/drivers/gpu/drm/scheduler/tests/Makefile
+++ b/drivers/gpu/drm/scheduler/tests/Makefile
@@ -2,6 +2,7 @@
drm-sched-tests-y := \
mock_scheduler.o \
- tests_basic.o
+ tests_basic.o \
+ tests_scheduler.o
obj-$(CONFIG_DRM_SCHED_KUNIT_TEST) += drm-sched-tests.o
diff --git a/drivers/gpu/drm/scheduler/tests/tests_scheduler.c b/drivers/gpu/drm/scheduler/tests/tests_scheduler.c
new file mode 100644
index 000000000000..b66321ef7abe
--- /dev/null
+++ b/drivers/gpu/drm/scheduler/tests/tests_scheduler.c
@@ -0,0 +1,631 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Valve Corporation */
+
+#include <linux/delay.h>
+#include <linux/kthread.h>
+#include <linux/ktime.h>
+
+#include "sched_tests.h"
+
+/*
+ * DRM scheduler scheduler tests exercise load balancing decisions ie. entity
+ * selection logic.
+ */
+
+static int drm_sched_scheduler_init(struct kunit *test)
+{
+ struct drm_mock_scheduler *sched;
+
+ sched = drm_mock_sched_new(test, MAX_SCHEDULE_TIMEOUT);
+ sched->base.credit_limit = 1;
+
+ test->priv = sched;
+
+ return 0;
+}
+
+static int drm_sched_scheduler_init2(struct kunit *test)
+{
+ struct drm_mock_scheduler *sched;
+
+ sched = drm_mock_sched_new(test, MAX_SCHEDULE_TIMEOUT);
+ sched->base.credit_limit = 2;
+
+ test->priv = sched;
+
+ return 0;
+}
+
+static void drm_sched_scheduler_exit(struct kunit *test)
+{
+ struct drm_mock_scheduler *sched = test->priv;
+
+ drm_mock_sched_fini(sched);
+}
+
+static void drm_sched_scheduler_queue_overhead(struct kunit *test)
+{
+ struct drm_mock_scheduler *sched = test->priv;
+ struct drm_mock_sched_entity *entity;
+ const unsigned int job_us = 1000;
+ const unsigned int jobs = 1000;
+ const unsigned int total_us = jobs * job_us;
+ struct drm_mock_sched_job *job, *first;
+ ktime_t start, end;
+ bool done;
+ int i;
+
+ /*
+ * Deep queue job at a time processing (single credit).
+ *
+ * This measures the overhead of picking and processing a job at a time
+ * by comparing the ideal total "GPU" time of all submitted jobs versus
+ * the time actually taken.
+ */
+
+ KUNIT_ASSERT_EQ(test, sched->base.credit_limit, 1);
+
+ entity = drm_mock_sched_entity_new(test,
+ DRM_SCHED_PRIORITY_NORMAL,
+ sched);
+
+ for (i = 0; i <= jobs; i++) {
+ job = drm_mock_sched_job_new(test, entity);
+ if (i == 0)
+ first = job; /* Extra first job blocks the queue */
+ else
+ drm_mock_sched_job_set_duration_us(job, job_us);
+ drm_mock_sched_job_submit(job);
+ }
+
+ done = drm_mock_sched_job_wait_scheduled(first, HZ);
+ KUNIT_ASSERT_TRUE(test, done);
+
+ start = ktime_get();
+ i = drm_mock_sched_advance(sched, 1); /* Release the queue */
+ KUNIT_ASSERT_EQ(test, i, 1);
+
+ done = drm_mock_sched_job_wait_finished(job,
+ usecs_to_jiffies(total_us) * 5);
+ end = ktime_get();
+ KUNIT_ASSERT_TRUE(test, done);
+
+ pr_info("Expected %uus, actual %lldus\n",
+ total_us,
+ ktime_to_us(ktime_sub(end, start)));
+
+ drm_mock_sched_entity_free(entity);
+}
+
+static void drm_sched_scheduler_ping_pong(struct kunit *test)
+{
+ struct drm_mock_sched_job *job, *first, *prev = NULL;
+ struct drm_mock_scheduler *sched = test->priv;
+ struct drm_mock_sched_entity *entity[2];
+ const unsigned int job_us = 1000;
+ const unsigned int jobs = 1000;
+ const unsigned int total_us = jobs * job_us;
+ ktime_t start, end;
+ bool done;
+ int i;
+
+ /*
+ * Two entitites in inter-dependency chain.
+ *
+ * This measures the overhead of picking and processing a job at a time,
+ * where each job depends on the previous one from the diffferent
+ * entity, by comparing the ideal total "GPU" time of all submitted jobs
+ * versus the time actually taken.
+ */
+
+ KUNIT_ASSERT_EQ(test, sched->base.credit_limit, 1);
+
+ for (i = 0; i < ARRAY_SIZE(entity); i++)
+ entity[i] = drm_mock_sched_entity_new(test,
+ DRM_SCHED_PRIORITY_NORMAL,
+ sched);
+
+ for (i = 0; i <= jobs; i++) {
+ job = drm_mock_sched_job_new(test, entity[i & 1]);
+ if (i == 0)
+ first = job; /* Extra first job blocks the queue */
+ else
+ drm_mock_sched_job_set_duration_us(job, job_us);
+ if (prev)
+ drm_sched_job_add_dependency(&job->base,
+ dma_fence_get(&prev->base.s_fence->finished));
+ drm_mock_sched_job_submit(job);
+ prev = job;
+ }
+
+ done = drm_mock_sched_job_wait_scheduled(first, HZ);
+ KUNIT_ASSERT_TRUE(test, done);
+
+ start = ktime_get();
+ i = drm_mock_sched_advance(sched, 1); /* Release the queue */
+ KUNIT_ASSERT_EQ(test, i, 1);
+
+ done = drm_mock_sched_job_wait_finished(job,
+ usecs_to_jiffies(total_us) * 5);
+ end = ktime_get();
+ KUNIT_ASSERT_TRUE(test, done);
+
+ pr_info("Expected %uus, actual %lldus\n",
+ total_us,
+ ktime_to_us(ktime_sub(end, start)));
+
+ for (i = 0; i < ARRAY_SIZE(entity); i++)
+ drm_mock_sched_entity_free(entity[i]);
+}
+
+static struct kunit_case drm_sched_scheduler_overhead_tests[] = {
+ KUNIT_CASE_SLOW(drm_sched_scheduler_queue_overhead),
+ KUNIT_CASE_SLOW(drm_sched_scheduler_ping_pong),
+ {}
+};
+
+static struct kunit_suite drm_sched_scheduler_overhead = {
+ .name = "drm_sched_scheduler_overhead_tests",
+ .init = drm_sched_scheduler_init,
+ .exit = drm_sched_scheduler_exit,
+ .test_cases = drm_sched_scheduler_overhead_tests,
+};
+
+struct drm_sched_client_params {
+ enum drm_sched_priority priority;
+ unsigned int job_cnt;
+ unsigned int job_us;
+ unsigned int wait_us;
+ bool sync;
+};
+
+struct drm_sched_test_params {
+ const char *description;
+ struct drm_sched_client_params client[2];
+};
+
+static const struct drm_sched_test_params drm_sched_cases[] = {
+ {
+ .description = "Normal and normal",
+ .client[0] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 1,
+ .job_us = 8000,
+ .wait_us = 0,
+ .sync = false,
+ },
+ .client[1] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 1,
+ .job_us = 8000,
+ .wait_us = 0,
+ .sync = false,
+ },
+ },
+ {
+ .description = "Normal and low",
+ .client[0] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 1,
+ .job_us = 8000,
+ .wait_us = 0,
+ .sync = false,
+ },
+ .client[1] = {
+ .priority = DRM_SCHED_PRIORITY_LOW,
+ .job_cnt = 1,
+ .job_us = 8000,
+ .wait_us = 0,
+ .sync = false,
+ },
+ },
+ {
+ .description = "High and normal",
+ .client[0] = {
+ .priority = DRM_SCHED_PRIORITY_HIGH,
+ .job_cnt = 1,
+ .job_us = 8000,
+ .wait_us = 0,
+ .sync = false,
+ },
+ .client[1] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 1,
+ .job_us = 8000,
+ .wait_us = 0,
+ .sync = false,
+ },
+ },
+ {
+ .description = "High and low",
+ .client[0] = {
+ .priority = DRM_SCHED_PRIORITY_HIGH,
+ .job_cnt = 1,
+ .job_us = 8000,
+ .wait_us = 0,
+ .sync = false,
+ },
+ .client[1] = {
+ .priority = DRM_SCHED_PRIORITY_LOW,
+ .job_cnt = 1,
+ .job_us = 8000,
+ .wait_us = 0,
+ .sync = false,
+ },
+ },
+ {
+ .description = "50 and 50",
+ .client[0] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 1,
+ .job_us = 1500,
+ .wait_us = 1500,
+ .sync = true,
+ },
+ .client[1] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 1,
+ .job_us = 2500,
+ .wait_us = 2500,
+ .sync = true,
+ },
+ },
+ {
+ .description = "50 and 50 low",
+ .client[0] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 1,
+ .job_us = 1500,
+ .wait_us = 1500,
+ .sync = true,
+ },
+ .client[1] = {
+ .priority = DRM_SCHED_PRIORITY_LOW,
+ .job_cnt = 1,
+ .job_us = 2500,
+ .wait_us = 2500,
+ .sync = true,
+ },
+ },
+ {
+ .description = "50 high and 50",
+ .client[0] = {
+ .priority = DRM_SCHED_PRIORITY_HIGH,
+ .job_cnt = 1,
+ .job_us = 1500,
+ .wait_us = 1500,
+ .sync = true,
+ },
+ .client[1] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 1,
+ .job_us = 2500,
+ .wait_us = 2500,
+ .sync = true,
+ },
+ },
+ {
+ .description = "Low hog and interactive",
+ .client[0] = {
+ .priority = DRM_SCHED_PRIORITY_LOW,
+ .job_cnt = 3,
+ .job_us = 2500,
+ .wait_us = 500,
+ .sync = false,
+ },
+ .client[1] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 1,
+ .job_us = 500,
+ .wait_us = 10000,
+ .sync = true,
+ },
+ },
+ {
+ .description = "Heavy and interactive",
+ .client[0] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 3,
+ .job_us = 2500,
+ .wait_us = 2500,
+ .sync = true,
+ },
+ .client[1] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 1,
+ .job_us = 1000,
+ .wait_us = 9000,
+ .sync = true,
+ },
+ },
+ {
+ .description = "Very heavy and interactive",
+ .client[0] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 4,
+ .job_us = 50000,
+ .wait_us = 1,
+ .sync = true,
+ },
+ .client[1] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 1,
+ .job_us = 1000,
+ .wait_us = 9000,
+ .sync = true,
+ },
+ },
+};
+
+static void
+drm_sched_desc(const struct drm_sched_test_params *params, char *desc)
+{
+ strscpy(desc, params->description, KUNIT_PARAM_DESC_SIZE);
+}
+
+KUNIT_ARRAY_PARAM(drm_sched_scheduler_two_clients,
+ drm_sched_cases,
+ drm_sched_desc);
+
+struct test_client_stats {
+ unsigned long min_us;
+ unsigned long max_us;
+ unsigned long avg_us;
+};
+
+struct test_client {
+ struct kunit *test;
+
+ struct drm_mock_sched_entity *entity;
+
+ struct kthread_worker *worker;
+ struct kthread_work work;
+
+ unsigned int id;
+ ktime_t duration;
+
+ struct drm_sched_client_params params;
+
+ ktime_t ideal_duration;
+ unsigned int cycles;
+ unsigned int cycle;
+ ktime_t start;
+ ktime_t end;
+ bool done;
+
+ struct test_client_stats cycle_time;
+ struct test_client_stats latency_time;
+};
+
+static void
+update_stats(struct test_client_stats *stats, unsigned int n, unsigned long us)
+{
+ if (us > stats->max_us)
+ stats->max_us = us;
+ if (us < stats->min_us)
+ stats->min_us = us;
+ stats->avg_us = DIV_ROUND_UP(n * stats->avg_us + us, n + 1);
+}
+
+static void drm_sched_client_work(struct kthread_work *work)
+{
+ struct test_client *client = container_of(work, typeof(*client), work);
+ const long sync_wait = MAX_SCHEDULE_TIMEOUT;
+ unsigned int cycle, work_us, period_us;
+ struct drm_mock_sched_job *job = NULL;
+
+ work_us = client->params.job_cnt * client->params.job_us;
+ period_us = work_us + client->params.wait_us;
+ client->cycles = DIV_ROUND_UP(ktime_to_us(client->duration), period_us);
+ client->ideal_duration = us_to_ktime(client->cycles * period_us);
+
+ client->start = ktime_get();
+
+ for (cycle = 0; cycle < client->cycles; cycle++) {
+ unsigned int batch;
+ unsigned long us;
+ ktime_t t;
+
+ if (READ_ONCE(client->done))
+ break;
+
+ t = ktime_get();
+ for (batch = 0; batch < client->params.job_cnt; batch++) {
+ job = drm_mock_sched_job_new(client->test,
+ client->entity);
+ drm_mock_sched_job_set_duration_us(job,
+ client->params.job_us);
+ drm_mock_sched_job_submit(job);
+ }
+
+ if (client->params.sync)
+ drm_mock_sched_job_wait_finished(job, sync_wait);
+
+ t = ktime_sub(ktime_get(), t);
+ us = ktime_to_us(t);
+ update_stats(&client->cycle_time, cycle, us);
+ if (ktime_to_us(t) >= (long)work_us)
+ us = ktime_to_us(t) - work_us;
+ else if (WARN_ON_ONCE(client->params.sync))
+ us = 0;
+ update_stats(&client->latency_time, cycle, us);
+ WRITE_ONCE(client->cycle, cycle);
+
+ if (READ_ONCE(client->done))
+ break;
+
+ if (client->params.wait_us)
+ fsleep(client->params.wait_us);
+ else
+ cond_resched();
+ }
+
+ client->done = drm_mock_sched_job_wait_finished(job, sync_wait);
+ client->end = ktime_get();
+}
+
+static const char *prio_str(enum drm_sched_priority prio)
+{
+ switch (prio) {
+ case DRM_SCHED_PRIORITY_KERNEL:
+ return "kernel";
+ case DRM_SCHED_PRIORITY_LOW:
+ return "low";
+ case DRM_SCHED_PRIORITY_NORMAL:
+ return "normal";
+ case DRM_SCHED_PRIORITY_HIGH:
+ return "high";
+ default:
+ return "???";
+ }
+}
+
+static void drm_sched_scheduler_two_clients_test(struct kunit *test)
+{
+ const struct drm_sched_test_params *params = test->param_value;
+ struct drm_mock_scheduler *sched = test->priv;
+ struct test_client client[2] = { };
+ unsigned int prev_cycle[2] = { };
+ unsigned int i, j;
+ ktime_t start;
+
+ /*
+ * Same job stream from from two clients.
+ */
+
+ for (i = 0; i < ARRAY_SIZE(client); i++)
+ client[i].entity =
+ drm_mock_sched_entity_new(test,
+ params->client[i].priority,
+ sched);
+
+ for (i = 0; i < ARRAY_SIZE(client); i++) {
+ client[i].test = test;
+ client[i].id = i;
+ client[i].duration = ms_to_ktime(1000);
+ client[i].params = params->client[i];
+ client[i].cycle_time.min_us = ~0UL;
+ client[i].latency_time.min_us = ~0UL;
+ client[i].worker =
+ kthread_create_worker(0, "%s-%u", __func__, i);
+ if (IS_ERR(client[i].worker)) {
+ for (j = 0; j < i; j++)
+ kthread_destroy_worker(client[j].worker);
+ KUNIT_FAIL(test, "Failed to create worker!\n");
+ }
+
+ kthread_init_work(&client[i].work, drm_sched_client_work);
+ }
+
+ for (i = 0; i < ARRAY_SIZE(client); i++)
+ kthread_queue_work(client[i].worker, &client[i].work);
+
+ /*
+ * The clients (workers) can be a mix of async (deep submission queue),
+ * sync (one job at a time), or something in between. Therefore it is
+ * difficult to display a single metric representing their progress.
+ *
+ * Each struct drm_sched_client_params describes the actual submission
+ * pattern which happens in the following steps:
+ * 1. Submit N jobs
+ * 2. Wait for last submitted job to finish
+ * 3. Sleep for U micro-seconds
+ * 4. Goto 1. for C cycles
+ *
+ * Where number of cycles is calculated to match the target client
+ * duration from the respective struct drm_sched_test_params.
+ *
+ * To asses scheduling behaviour what we output for both clients is:
+ * - pct: Percentage progress of the jobs submitted
+ * - cps: "Cycles" per second (where one cycle is one 1.-4. above)
+ * - qd: Number of outstanding jobs in the client/entity
+ */
+
+ start = ktime_get();
+ pr_info("%s:\n\t pct1 cps1 qd1; pct2 cps2 qd2\n",
+ params->description);
+ while (!READ_ONCE(client[0].done) || !READ_ONCE(client[1].done)) {
+ unsigned int pct[2], qd[2], cycle[2], cps[2];
+
+ for (i = 0; i < ARRAY_SIZE(client); i++) {
+ qd[i] = spsc_queue_count(&client[i].entity->base.job_queue);
+ cycle[i] = READ_ONCE(client[i].cycle);
+ cps[i] = DIV_ROUND_UP(1000 * (cycle[i] - prev_cycle[i]),
+ 100);
+ if (client[i].cycles)
+ pct[i] = DIV_ROUND_UP(100 * (1 + cycle[i]),
+ client[i].cycles);
+ else
+ pct[i] = 0;
+ prev_cycle[i] = cycle[i];
+ }
+
+ if (READ_ONCE(client[0].done))
+ pr_info("\t+%6lldms: ; %3u %5u %4u\n",
+ ktime_to_ms(ktime_sub(ktime_get(), start)),
+ pct[1], cps[1], qd[1]);
+ else if (READ_ONCE(client[1].done))
+ pr_info("\t+%6lldms: %3u %5u %4u;\n",
+ ktime_to_ms(ktime_sub(ktime_get(), start)),
+ pct[0], cps[0], qd[0]);
+ else
+ pr_info("\t+%6lldms: %3u %5u %4u; %3u %5u %4u\n",
+ ktime_to_ms(ktime_sub(ktime_get(), start)),
+ pct[0], cps[0], qd[0],
+ pct[1], cps[1], qd[1]);
+ msleep(100);
+ }
+
+ for (i = 0; i < ARRAY_SIZE(client); i++) {
+ kthread_flush_work(&client[i].work);
+ kthread_destroy_worker(client[i].worker);
+ }
+
+ for (i = 0; i < ARRAY_SIZE(client); i++)
+ KUNIT_ASSERT_TRUE(test, client[i].done);
+
+ for (i = 0; i < ARRAY_SIZE(client); i++) {
+ pr_info(" %u: prio=%s sync=%u elapsed_ms=%lldms (ideal_ms=%lldms) cycle_time(min,avg,max)=%lu,%lu,%lu us latency_time(min,avg,max)=%lu,%lu,%lu us",
+ i,
+ prio_str(params->client[i].priority),
+ params->client[i].sync,
+ ktime_to_ms(ktime_sub(client[i].end, client[i].start)),
+ ktime_to_ms(client[i].ideal_duration),
+ client[i].cycle_time.min_us,
+ client[i].cycle_time.avg_us,
+ client[i].cycle_time.max_us,
+ client[i].latency_time.min_us,
+ client[i].latency_time.avg_us,
+ client[i].latency_time.max_us);
+ drm_mock_sched_entity_free(client[i].entity);
+ }
+}
+
+static const struct kunit_attributes drm_sched_scheduler_two_clients_attr = {
+ .speed = KUNIT_SPEED_SLOW,
+};
+
+static struct kunit_case drm_sched_scheduler_two_clients_tests[] = {
+ KUNIT_CASE_PARAM_ATTR(drm_sched_scheduler_two_clients_test,
+ drm_sched_scheduler_two_clients_gen_params,
+ drm_sched_scheduler_two_clients_attr),
+ {}
+};
+
+static struct kunit_suite drm_sched_scheduler_two_clients1 = {
+ .name = "drm_sched_scheduler_two_clients_one_credit_tests",
+ .init = drm_sched_scheduler_init,
+ .exit = drm_sched_scheduler_exit,
+ .test_cases = drm_sched_scheduler_two_clients_tests,
+};
+
+static struct kunit_suite drm_sched_scheduler_two_clients2 = {
+ .name = "drm_sched_scheduler_two_clients_two_credits_tests",
+ .init = drm_sched_scheduler_init2,
+ .exit = drm_sched_scheduler_exit,
+ .test_cases = drm_sched_scheduler_two_clients_tests,
+};
+
+kunit_test_suites(&drm_sched_scheduler_overhead,
+ &drm_sched_scheduler_two_clients1,
+ &drm_sched_scheduler_two_clients2);
--
2.48.0
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [RFC v4 02/16] drm/sched: Add some more scheduling quality unit tests
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 01/16] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
@ 2025-04-25 10:20 ` Tvrtko Ursulin
2025-04-29 15:07 ` Christian König
2025-04-25 10:20 ` [RFC v4 03/16] drm/sched: De-clutter drm_sched_init Tvrtko Ursulin
` (15 subsequent siblings)
17 siblings, 1 reply; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-25 10:20 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König,
Danilo Krummrich, Matthew Brost, Philipp Stanner,
Pierre-Eric Pelloux-Prayer
This time round we explore the rate of submitted job queue processing
with multiple identical parallel clients.
Example test output:
3 clients:
t cycle: min avg max : ...
+ 0ms 0 0 0 : 0 0 0
+ 102ms 2 2 2 : 2 2 2
+ 208ms 5 6 6 : 6 5 5
+ 310ms 8 9 9 : 9 9 8
...
+ 2616ms 82 83 83 : 83 83 82
+ 2717ms 83 83 83 : 83 83 83
avg_max_min_delta(x100)=60
Every 100ms for the duration of the test test logs how many jobs each
client had completed, prefixed by minimum, average and maximum numbers.
When finished overall average delta between max and min is output as a
rough indicator to scheduling fairness.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
.../gpu/drm/scheduler/tests/tests_scheduler.c | 186 +++++++++++++++++-
1 file changed, 185 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/scheduler/tests/tests_scheduler.c b/drivers/gpu/drm/scheduler/tests/tests_scheduler.c
index b66321ef7abe..d70b47d7bf7a 100644
--- a/drivers/gpu/drm/scheduler/tests/tests_scheduler.c
+++ b/drivers/gpu/drm/scheduler/tests/tests_scheduler.c
@@ -181,6 +181,7 @@ struct drm_sched_client_params {
struct drm_sched_test_params {
const char *description;
+ unsigned int num_clients;
struct drm_sched_client_params client[2];
};
@@ -626,6 +627,189 @@ static struct kunit_suite drm_sched_scheduler_two_clients2 = {
.test_cases = drm_sched_scheduler_two_clients_tests,
};
+
+static const struct drm_sched_test_params drm_sched_many_cases[] = {
+ {
+ .description = "2 clients",
+ .num_clients = 2,
+ .client[0] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 4,
+ .job_us = 1000,
+ .wait_us = 0,
+ .sync = true,
+ },
+ },
+ {
+ .description = "3 clients",
+ .num_clients = 3,
+ .client[0] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 4,
+ .job_us = 1000,
+ .wait_us = 0,
+ .sync = true,
+ },
+ },
+ {
+ .description = "7 clients",
+ .num_clients = 7,
+ .client[0] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 4,
+ .job_us = 1000,
+ .wait_us = 0,
+ .sync = true,
+ },
+ },
+ {
+ .description = "13 clients",
+ .num_clients = 13,
+ .client[0] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 4,
+ .job_us = 1000,
+ .wait_us = 0,
+ .sync = true,
+ },
+ },
+ {
+ .description = "31 clients",
+ .num_clients = 31,
+ .client[0] = {
+ .priority = DRM_SCHED_PRIORITY_NORMAL,
+ .job_cnt = 2,
+ .job_us = 1000,
+ .wait_us = 0,
+ .sync = true,
+ },
+ },
+};
+
+KUNIT_ARRAY_PARAM(drm_sched_scheduler_many_clients,
+ drm_sched_many_cases,
+ drm_sched_desc);
+
+static void drm_sched_scheduler_many_clients_test(struct kunit *test)
+{
+ const struct drm_sched_test_params *params = test->param_value;
+ struct drm_mock_scheduler *sched = test->priv;
+ const unsigned int clients = params->num_clients;
+ unsigned int i, j, delta_total = 0, loops = 0;
+ struct test_client *client;
+ unsigned int *prev_cycle;
+ ktime_t start;
+ char *buf;
+
+ /*
+ * Many clients with deep-ish async queues.
+ */
+
+ buf = kunit_kmalloc(test, PAGE_SIZE, GFP_KERNEL);
+ client = kunit_kcalloc(test, clients, sizeof(*client), GFP_KERNEL);
+ prev_cycle = kunit_kcalloc(test, clients, sizeof(*prev_cycle),
+ GFP_KERNEL);
+
+ for (i = 0; i < clients; i++)
+ client[i].entity =
+ drm_mock_sched_entity_new(test,
+ DRM_SCHED_PRIORITY_NORMAL,
+ sched);
+
+ for (i = 0; i < clients; i++) {
+ client[i].test = test;
+ client[i].id = i;
+ client[i].params = params->client[0];
+ client[i].duration = ms_to_ktime(1000 / clients);
+ client[i].cycle_time.min_us = ~0UL;
+ client[i].latency_time.min_us = ~0UL;
+ client[i].worker =
+ kthread_create_worker(0, "%s-%u", __func__, i);
+ if (IS_ERR(client[i].worker)) {
+ for (j = 0; j < i; j++)
+ kthread_destroy_worker(client[j].worker);
+ KUNIT_FAIL(test, "Failed to create worker!\n");
+ }
+
+ kthread_init_work(&client[i].work, drm_sched_client_work);
+ }
+
+ for (i = 0; i < clients; i++)
+ kthread_queue_work(client[i].worker, &client[i].work);
+
+ start = ktime_get();
+ pr_info("%u clients:\n\tt\t\tcycle:\t min avg max : ...\n", clients);
+ for (;;) {
+ unsigned int min = ~0;
+ unsigned int max = 0;
+ unsigned int total = 0;
+ bool done = true;
+ char pbuf[16];
+
+ memset(buf, 0, PAGE_SIZE);
+ for (i = 0; i < clients; i++) {
+ unsigned int cycle, cycles;
+
+ cycle = READ_ONCE(client[i].cycle);
+ cycles = READ_ONCE(client[i].cycles);
+
+ snprintf(pbuf, sizeof(pbuf), " %3d", cycle);
+ strncat(buf, pbuf, PAGE_SIZE);
+
+ total += cycle;
+ if (cycle < min)
+ min = cycle;
+ if (cycle > max)
+ max = cycle;
+
+ if (!min || (cycle + 1) < cycles)
+ done = false;
+ }
+
+ loops++;
+ delta_total += max - min;
+
+ pr_info("\t+%6lldms\t\t %3u %3u %3u :%s\n",
+ ktime_to_ms(ktime_sub(ktime_get(), start)),
+ min, DIV_ROUND_UP(total, clients), max, buf);
+
+ if (done)
+ break;
+
+ msleep(100);
+ }
+
+ pr_info(" avg_max_min_delta(x100)=%u\n",
+ loops ? DIV_ROUND_UP(delta_total * 100, loops) : 0);
+
+ for (i = 0; i < clients; i++) {
+ kthread_flush_work(&client[i].work);
+ kthread_destroy_worker(client[i].worker);
+ }
+
+ for (i = 0; i < clients; i++)
+ drm_mock_sched_entity_free(client[i].entity);
+}
+
+static const struct kunit_attributes drm_sched_scheduler_many_clients_attr = {
+ .speed = KUNIT_SPEED_SLOW,
+};
+
+static struct kunit_case drm_sched_scheduler_many_clients_tests[] = {
+ KUNIT_CASE_PARAM_ATTR(drm_sched_scheduler_many_clients_test,
+ drm_sched_scheduler_many_clients_gen_params,
+ drm_sched_scheduler_many_clients_attr),
+ {}
+};
+
+static struct kunit_suite drm_sched_scheduler_many_clients = {
+ .name = "drm_sched_scheduler_many_clients_tests",
+ .init = drm_sched_scheduler_init2,
+ .exit = drm_sched_scheduler_exit,
+ .test_cases = drm_sched_scheduler_many_clients_tests,
+};
+
kunit_test_suites(&drm_sched_scheduler_overhead,
&drm_sched_scheduler_two_clients1,
- &drm_sched_scheduler_two_clients2);
+ &drm_sched_scheduler_two_clients2,
+ &drm_sched_scheduler_many_clients);
--
2.48.0
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [RFC v4 03/16] drm/sched: De-clutter drm_sched_init
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 01/16] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 02/16] drm/sched: Add some more " Tvrtko Ursulin
@ 2025-04-25 10:20 ` Tvrtko Ursulin
2025-04-29 15:16 ` Christian König
2025-04-25 10:20 ` [RFC v4 04/16] drm/sched: Avoid double re-lock on the job free path Tvrtko Ursulin
` (14 subsequent siblings)
17 siblings, 1 reply; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-25 10:20 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König,
Danilo Krummrich, Matthew Brost, Philipp Stanner
Move work queue allocation into a helper for a more streamlined function
body.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/sched_main.c | 27 +++++++++++++-------------
1 file changed, 14 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index ca5028f7a4e9..86e40157b09b 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -83,12 +83,6 @@
#define CREATE_TRACE_POINTS
#include "gpu_scheduler_trace.h"
-#ifdef CONFIG_LOCKDEP
-static struct lockdep_map drm_sched_lockdep_map = {
- .name = "drm_sched_lockdep_map"
-};
-#endif
-
int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
/**
@@ -1258,6 +1252,19 @@ static void drm_sched_run_job_work(struct work_struct *w)
drm_sched_run_job_queue(sched);
}
+static struct workqueue_struct *drm_sched_alloc_wq(const char *name)
+{
+#if (IS_ENABLED(CONFIG_LOCKDEP))
+ static struct lockdep_map map = {
+ .name = "drm_sched_lockdep_map"
+ };
+
+ return alloc_ordered_workqueue_lockdep_map(name, WQ_MEM_RECLAIM, &map);
+#else
+ return alloc_ordered_workqueue(name, WQ_MEM_RECLAIM);
+#endif
+}
+
/**
* drm_sched_init - Init a gpu scheduler instance
*
@@ -1298,13 +1305,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_
sched->submit_wq = args->submit_wq;
sched->own_submit_wq = false;
} else {
-#ifdef CONFIG_LOCKDEP
- sched->submit_wq = alloc_ordered_workqueue_lockdep_map(args->name,
- WQ_MEM_RECLAIM,
- &drm_sched_lockdep_map);
-#else
- sched->submit_wq = alloc_ordered_workqueue(args->name, WQ_MEM_RECLAIM);
-#endif
+ sched->submit_wq = drm_sched_alloc_wq(args->name);
if (!sched->submit_wq)
return -ENOMEM;
--
2.48.0
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [RFC v4 04/16] drm/sched: Avoid double re-lock on the job free path
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
` (2 preceding siblings ...)
2025-04-25 10:20 ` [RFC v4 03/16] drm/sched: De-clutter drm_sched_init Tvrtko Ursulin
@ 2025-04-25 10:20 ` Tvrtko Ursulin
2025-05-12 12:49 ` Philipp Stanner
2025-04-25 10:20 ` [RFC v4 05/16] drm/sched: Consolidate drm_sched_job_timedout Tvrtko Ursulin
` (13 subsequent siblings)
17 siblings, 1 reply; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-25 10:20 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König,
Danilo Krummrich, Matthew Brost, Philipp Stanner
Currently the job free work item will lock sched->job_list_lock first time
to see if there are any jobs, free a single job, and then lock again to
decide whether to re-queue itself if there are more finished jobs.
Since drm_sched_get_finished_job() already looks at the second job in the
queue we can simply add the signaled check and have it return the presence
of more jobs to free to the caller. That way the work item does not have
to lock the list again and repeat the signaled check.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/sched_main.c | 39 +++++++++++---------------
1 file changed, 16 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 86e40157b09b..a45b02fd2af3 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -365,22 +365,6 @@ static void __drm_sched_run_free_queue(struct drm_gpu_scheduler *sched)
queue_work(sched->submit_wq, &sched->work_free_job);
}
-/**
- * drm_sched_run_free_queue - enqueue free-job work if ready
- * @sched: scheduler instance
- */
-static void drm_sched_run_free_queue(struct drm_gpu_scheduler *sched)
-{
- struct drm_sched_job *job;
-
- spin_lock(&sched->job_list_lock);
- job = list_first_entry_or_null(&sched->pending_list,
- struct drm_sched_job, list);
- if (job && dma_fence_is_signaled(&job->s_fence->finished))
- __drm_sched_run_free_queue(sched);
- spin_unlock(&sched->job_list_lock);
-}
-
/**
* drm_sched_job_done - complete a job
* @s_job: pointer to the job which is done
@@ -1097,12 +1081,13 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
* drm_sched_get_finished_job - fetch the next finished job to be destroyed
*
* @sched: scheduler instance
+ * @have_more: are there more finished jobs on the list
*
* Returns the next finished job from the pending list (if there is one)
* ready for it to be destroyed.
*/
static struct drm_sched_job *
-drm_sched_get_finished_job(struct drm_gpu_scheduler *sched)
+drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool *have_more)
{
struct drm_sched_job *job, *next;
@@ -1110,22 +1095,27 @@ drm_sched_get_finished_job(struct drm_gpu_scheduler *sched)
job = list_first_entry_or_null(&sched->pending_list,
struct drm_sched_job, list);
-
if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
/* remove job from pending_list */
list_del_init(&job->list);
/* cancel this job's TO timer */
cancel_delayed_work(&sched->work_tdr);
- /* make the scheduled timestamp more accurate */
+
+ *have_more = false;
next = list_first_entry_or_null(&sched->pending_list,
typeof(*next), list);
-
if (next) {
+ /* make the scheduled timestamp more accurate */
if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
&next->s_fence->scheduled.flags))
next->s_fence->scheduled.timestamp =
dma_fence_timestamp(&job->s_fence->finished);
+
+ if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+ &next->s_fence->finished.flags))
+ *have_more = true;
+
/* start TO timer for next job */
drm_sched_start_timeout(sched);
}
@@ -1184,12 +1174,15 @@ static void drm_sched_free_job_work(struct work_struct *w)
struct drm_gpu_scheduler *sched =
container_of(w, struct drm_gpu_scheduler, work_free_job);
struct drm_sched_job *job;
+ bool have_more;
- job = drm_sched_get_finished_job(sched);
- if (job)
+ job = drm_sched_get_finished_job(sched, &have_more);
+ if (job) {
sched->ops->free_job(job);
+ if (have_more)
+ __drm_sched_run_free_queue(sched);
+ }
- drm_sched_run_free_queue(sched);
drm_sched_run_job_queue(sched);
}
--
2.48.0
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [RFC v4 05/16] drm/sched: Consolidate drm_sched_job_timedout
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
` (3 preceding siblings ...)
2025-04-25 10:20 ` [RFC v4 04/16] drm/sched: Avoid double re-lock on the job free path Tvrtko Ursulin
@ 2025-04-25 10:20 ` Tvrtko Ursulin
2025-05-12 12:53 ` Philipp Stanner
2025-04-25 10:20 ` [RFC v4 06/16] drm/sched: Consolidate drm_sched_rq_select_entity_rr Tvrtko Ursulin
` (12 subsequent siblings)
17 siblings, 1 reply; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-25 10:20 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König,
Danilo Krummrich, Matthew Brost, Philipp Stanner
Reduce to one spin_unlock for hopefully a little bit clearer flow in the
function. It may appear that there is a behavioural change with the
drm_sched_start_timeout_unlocked() now not being called if there were
initially no jobs on the pending list, and then some appeared after
unlock, however if the code would rely on the TDR handler restarting
itself then it would fail to do that if the job arrived on the pending
list after the check.
Also fix one stale comment while touching the function.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/sched_main.c | 37 +++++++++++++-------------
1 file changed, 18 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index a45b02fd2af3..a26cc11c8ade 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -516,38 +516,37 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
static void drm_sched_job_timedout(struct work_struct *work)
{
- struct drm_gpu_scheduler *sched;
+ struct drm_gpu_scheduler *sched =
+ container_of(work, struct drm_gpu_scheduler, work_tdr.work);
+ enum drm_gpu_sched_stat status;
struct drm_sched_job *job;
- enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL;
-
- sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
/* Protects against concurrent deletion in drm_sched_get_finished_job */
spin_lock(&sched->job_list_lock);
job = list_first_entry_or_null(&sched->pending_list,
struct drm_sched_job, list);
-
if (job) {
/*
* Remove the bad job so it cannot be freed by concurrent
- * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
- * is parked at which point it's safe.
+ * drm_sched_get_finished_job. It will be reinserted back after
+ * scheduler worker is stopped at which point it's safe.
*/
list_del_init(&job->list);
- spin_unlock(&sched->job_list_lock);
+ }
+ spin_unlock(&sched->job_list_lock);
- status = job->sched->ops->timedout_job(job);
+ if (!job)
+ return;
- /*
- * Guilty job did complete and hence needs to be manually removed
- * See drm_sched_stop doc.
- */
- if (sched->free_guilty) {
- job->sched->ops->free_job(job);
- sched->free_guilty = false;
- }
- } else {
- spin_unlock(&sched->job_list_lock);
+ status = job->sched->ops->timedout_job(job);
+
+ /*
+ * Guilty job did complete and hence needs to be manually removed. See
+ * documentation for drm_sched_stop.
+ */
+ if (sched->free_guilty) {
+ job->sched->ops->free_job(job);
+ sched->free_guilty = false;
}
if (status != DRM_GPU_SCHED_STAT_ENODEV)
--
2.48.0
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [RFC v4 06/16] drm/sched: Consolidate drm_sched_rq_select_entity_rr
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
` (4 preceding siblings ...)
2025-04-25 10:20 ` [RFC v4 05/16] drm/sched: Consolidate drm_sched_job_timedout Tvrtko Ursulin
@ 2025-04-25 10:20 ` Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 07/16] drm/sched: Implement RR via FIFO Tvrtko Ursulin
` (11 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-25 10:20 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König,
Danilo Krummrich, Matthew Brost, Philipp Stanner
Extract out two copies of the identical code to function epilogue to make
it smaller and more readable.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/sched_main.c | 48 +++++++++++---------------
1 file changed, 20 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index a26cc11c8ade..381f556096af 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -262,38 +262,14 @@ drm_sched_rq_select_entity_rr(struct drm_gpu_scheduler *sched,
entity = rq->current_entity;
if (entity) {
list_for_each_entry_continue(entity, &rq->entities, list) {
- if (drm_sched_entity_is_ready(entity)) {
- /* If we can't queue yet, preserve the current
- * entity in terms of fairness.
- */
- if (!drm_sched_can_queue(sched, entity)) {
- spin_unlock(&rq->lock);
- return ERR_PTR(-ENOSPC);
- }
-
- rq->current_entity = entity;
- reinit_completion(&entity->entity_idle);
- spin_unlock(&rq->lock);
- return entity;
- }
+ if (drm_sched_entity_is_ready(entity))
+ goto found;
}
}
list_for_each_entry(entity, &rq->entities, list) {
- if (drm_sched_entity_is_ready(entity)) {
- /* If we can't queue yet, preserve the current entity in
- * terms of fairness.
- */
- if (!drm_sched_can_queue(sched, entity)) {
- spin_unlock(&rq->lock);
- return ERR_PTR(-ENOSPC);
- }
-
- rq->current_entity = entity;
- reinit_completion(&entity->entity_idle);
- spin_unlock(&rq->lock);
- return entity;
- }
+ if (drm_sched_entity_is_ready(entity))
+ goto found;
if (entity == rq->current_entity)
break;
@@ -302,6 +278,22 @@ drm_sched_rq_select_entity_rr(struct drm_gpu_scheduler *sched,
spin_unlock(&rq->lock);
return NULL;
+
+found:
+ if (!drm_sched_can_queue(sched, entity)) {
+ /*
+ * If scheduler cannot take more jobs signal the caller to not
+ * consider lower priority queues.
+ */
+ entity = ERR_PTR(-ENOSPC);
+ } else {
+ rq->current_entity = entity;
+ reinit_completion(&entity->entity_idle);
+ }
+
+ spin_unlock(&rq->lock);
+
+ return entity;
}
/**
--
2.48.0
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [RFC v4 07/16] drm/sched: Implement RR via FIFO
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
` (5 preceding siblings ...)
2025-04-25 10:20 ` [RFC v4 06/16] drm/sched: Consolidate drm_sched_rq_select_entity_rr Tvrtko Ursulin
@ 2025-04-25 10:20 ` Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 08/16] drm/sched: Consolidate entity run queue management Tvrtko Ursulin
` (10 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-25 10:20 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König,
Danilo Krummrich, Matthew Brost, Philipp Stanner
Round-robin being the non-default policy and unclear how much it is used,
we can notice that it can be implemented using the FIFO data structures if
we only invent a fake submit timestamp which is monotonically increasing
inside drm_sched_rq instances.
So instead of remembering which was the last entity the scheduler worker
picked, we can bump the picked one to the bottom of the tree, achieving
the same round-robin behaviour.
Advantage is that we can consolidate to a single code path and remove a
bunch of code. Downside is round-robin mode now needs to lock on the job
pop path but that should not be visible.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/sched_entity.c | 45 ++++++++------
drivers/gpu/drm/scheduler/sched_main.c | 76 ++----------------------
include/drm/gpu_scheduler.h | 5 +-
3 files changed, 36 insertions(+), 90 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 9b0122e99b44..bbb7f3d3e3e8 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -469,9 +469,19 @@ drm_sched_job_dependency(struct drm_sched_job *job,
return NULL;
}
+static ktime_t
+drm_sched_rq_get_rr_deadline(struct drm_sched_rq *rq)
+{
+ lockdep_assert_held(&rq->lock);
+
+ rq->rr_deadline = ktime_add_ns(rq->rr_deadline, 1);
+
+ return rq->rr_deadline;
+}
+
struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
{
- struct drm_sched_job *sched_job;
+ struct drm_sched_job *sched_job, *next_job;
sched_job = drm_sched_entity_queue_peek(entity);
if (!sched_job)
@@ -506,21 +516,22 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
* Update the entity's location in the min heap according to
* the timestamp of the next job, if any.
*/
- if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) {
- struct drm_sched_job *next;
+ next_job = drm_sched_entity_queue_peek(entity);
+ if (next_job) {
+ struct drm_sched_rq *rq;
+ ktime_t ts;
- next = drm_sched_entity_queue_peek(entity);
- if (next) {
- struct drm_sched_rq *rq;
+ if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+ ts = next_job->submit_ts;
+ else
+ ts = drm_sched_rq_get_rr_deadline(rq);
- spin_lock(&entity->lock);
- rq = entity->rq;
- spin_lock(&rq->lock);
- drm_sched_rq_update_fifo_locked(entity, rq,
- next->submit_ts);
- spin_unlock(&rq->lock);
- spin_unlock(&entity->lock);
- }
+ spin_lock(&entity->lock);
+ rq = entity->rq;
+ spin_lock(&rq->lock);
+ drm_sched_rq_update_fifo_locked(entity, rq, ts);
+ spin_unlock(&rq->lock);
+ spin_unlock(&entity->lock);
}
/* Jobs and entities might have different lifecycles. Since we're
@@ -619,9 +630,9 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
spin_lock(&rq->lock);
drm_sched_rq_add_entity(rq, entity);
-
- if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
- drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
+ if (drm_sched_policy == DRM_SCHED_POLICY_RR)
+ submit_ts = drm_sched_rq_get_rr_deadline(rq);
+ drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
spin_unlock(&rq->lock);
spin_unlock(&entity->lock);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 381f556096af..2bac478a50bf 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -184,7 +184,6 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
spin_lock_init(&rq->lock);
INIT_LIST_HEAD(&rq->entities);
rq->rb_tree_root = RB_ROOT_CACHED;
- rq->current_entity = NULL;
rq->sched = sched;
}
@@ -230,74 +229,13 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
atomic_dec(rq->sched->score);
list_del_init(&entity->list);
- if (rq->current_entity == entity)
- rq->current_entity = NULL;
-
- if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
- drm_sched_rq_remove_fifo_locked(entity, rq);
+ drm_sched_rq_remove_fifo_locked(entity, rq);
spin_unlock(&rq->lock);
}
/**
- * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run
- *
- * @sched: the gpu scheduler
- * @rq: scheduler run queue to check.
- *
- * Try to find the next ready entity.
- *
- * Return an entity if one is found; return an error-pointer (!NULL) if an
- * entity was ready, but the scheduler had insufficient credits to accommodate
- * its job; return NULL, if no ready entity was found.
- */
-static struct drm_sched_entity *
-drm_sched_rq_select_entity_rr(struct drm_gpu_scheduler *sched,
- struct drm_sched_rq *rq)
-{
- struct drm_sched_entity *entity;
-
- spin_lock(&rq->lock);
-
- entity = rq->current_entity;
- if (entity) {
- list_for_each_entry_continue(entity, &rq->entities, list) {
- if (drm_sched_entity_is_ready(entity))
- goto found;
- }
- }
-
- list_for_each_entry(entity, &rq->entities, list) {
- if (drm_sched_entity_is_ready(entity))
- goto found;
-
- if (entity == rq->current_entity)
- break;
- }
-
- spin_unlock(&rq->lock);
-
- return NULL;
-
-found:
- if (!drm_sched_can_queue(sched, entity)) {
- /*
- * If scheduler cannot take more jobs signal the caller to not
- * consider lower priority queues.
- */
- entity = ERR_PTR(-ENOSPC);
- } else {
- rq->current_entity = entity;
- reinit_completion(&entity->entity_idle);
- }
-
- spin_unlock(&rq->lock);
-
- return entity;
-}
-
-/**
- * drm_sched_rq_select_entity_fifo - Select an entity which provides a job to run
+ * drm_sched_rq_select_entity - Select an entity which provides a job to run
*
* @sched: the gpu scheduler
* @rq: scheduler run queue to check.
@@ -309,8 +247,8 @@ drm_sched_rq_select_entity_rr(struct drm_gpu_scheduler *sched,
* its job; return NULL, if no ready entity was found.
*/
static struct drm_sched_entity *
-drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched,
- struct drm_sched_rq *rq)
+drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
+ struct drm_sched_rq *rq)
{
struct rb_node *rb;
@@ -1052,15 +990,13 @@ void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
static struct drm_sched_entity *
drm_sched_select_entity(struct drm_gpu_scheduler *sched)
{
- struct drm_sched_entity *entity;
+ struct drm_sched_entity *entity = NULL;
int i;
/* Start with the highest priority.
*/
for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
- entity = drm_sched_policy == DRM_SCHED_POLICY_FIFO ?
- drm_sched_rq_select_entity_fifo(sched, sched->sched_rq[i]) :
- drm_sched_rq_select_entity_rr(sched, sched->sched_rq[i]);
+ entity = drm_sched_rq_select_entity(sched, sched->sched_rq[i]);
if (entity)
break;
}
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 1a7e377d4cbb..1073cc569cce 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -239,8 +239,7 @@ struct drm_sched_entity {
* struct drm_sched_rq - queue of entities to be scheduled.
*
* @sched: the scheduler to which this rq belongs to.
- * @lock: protects @entities, @rb_tree_root and @current_entity.
- * @current_entity: the entity which is to be scheduled.
+ * @lock: protects @entities, @rb_tree_root and @rr_deadline.
* @entities: list of the entities to be scheduled.
* @rb_tree_root: root of time based priority queue of entities for FIFO scheduling
*
@@ -253,7 +252,7 @@ struct drm_sched_rq {
spinlock_t lock;
/* Following members are protected by the @lock: */
- struct drm_sched_entity *current_entity;
+ ktime_t rr_deadline;
struct list_head entities;
struct rb_root_cached rb_tree_root;
};
--
2.48.0
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [RFC v4 08/16] drm/sched: Consolidate entity run queue management
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
` (6 preceding siblings ...)
2025-04-25 10:20 ` [RFC v4 07/16] drm/sched: Implement RR via FIFO Tvrtko Ursulin
@ 2025-04-25 10:20 ` Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 09/16] drm/sched: Move run queue related code into a separate file Tvrtko Ursulin
` (9 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-25 10:20 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König,
Danilo Krummrich, Matthew Brost, Philipp Stanner
Move the code dealing with entities entering and exiting run queues to
helpers to logically separate it from jobs entering and exiting entities.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/sched_entity.c | 60 ++-------------
drivers/gpu/drm/scheduler/sched_internal.h | 8 +-
drivers/gpu/drm/scheduler/sched_main.c | 87 +++++++++++++++++++---
3 files changed, 83 insertions(+), 72 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index bbb7f3d3e3e8..8362184fe431 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -469,19 +469,9 @@ drm_sched_job_dependency(struct drm_sched_job *job,
return NULL;
}
-static ktime_t
-drm_sched_rq_get_rr_deadline(struct drm_sched_rq *rq)
-{
- lockdep_assert_held(&rq->lock);
-
- rq->rr_deadline = ktime_add_ns(rq->rr_deadline, 1);
-
- return rq->rr_deadline;
-}
-
struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
{
- struct drm_sched_job *sched_job, *next_job;
+ struct drm_sched_job *sched_job;
sched_job = drm_sched_entity_queue_peek(entity);
if (!sched_job)
@@ -512,27 +502,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
spsc_queue_pop(&entity->job_queue);
- /*
- * Update the entity's location in the min heap according to
- * the timestamp of the next job, if any.
- */
- next_job = drm_sched_entity_queue_peek(entity);
- if (next_job) {
- struct drm_sched_rq *rq;
- ktime_t ts;
-
- if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
- ts = next_job->submit_ts;
- else
- ts = drm_sched_rq_get_rr_deadline(rq);
-
- spin_lock(&entity->lock);
- rq = entity->rq;
- spin_lock(&rq->lock);
- drm_sched_rq_update_fifo_locked(entity, rq, ts);
- spin_unlock(&rq->lock);
- spin_unlock(&entity->lock);
- }
+ drm_sched_rq_pop_entity(entity);
/* Jobs and entities might have different lifecycles. Since we're
* removing the job from the entities queue, set the jobs entity pointer
@@ -614,30 +584,10 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
/* first job wakes up scheduler */
if (first) {
struct drm_gpu_scheduler *sched;
- struct drm_sched_rq *rq;
- /* Add the entity to the run queue */
- spin_lock(&entity->lock);
- if (entity->stopped) {
- spin_unlock(&entity->lock);
-
- DRM_ERROR("Trying to push to a killed entity\n");
- return;
- }
-
- rq = entity->rq;
- sched = rq->sched;
-
- spin_lock(&rq->lock);
- drm_sched_rq_add_entity(rq, entity);
- if (drm_sched_policy == DRM_SCHED_POLICY_RR)
- submit_ts = drm_sched_rq_get_rr_deadline(rq);
- drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
-
- spin_unlock(&rq->lock);
- spin_unlock(&entity->lock);
-
- drm_sched_wakeup(sched);
+ sched = drm_sched_rq_add_entity(entity, submit_ts);
+ if (sched)
+ drm_sched_wakeup(sched);
}
}
EXPORT_SYMBOL(drm_sched_entity_push_job);
diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
index 599cf6e1bb74..8e7e477bace3 100644
--- a/drivers/gpu/drm/scheduler/sched_internal.h
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -12,13 +12,11 @@ extern int drm_sched_policy;
void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
-void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
- struct drm_sched_entity *entity);
+struct drm_gpu_scheduler *
+drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts);
void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
struct drm_sched_entity *entity);
-
-void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
- struct drm_sched_rq *rq, ktime_t ts);
+void drm_sched_rq_pop_entity(struct drm_sched_entity *entity);
void drm_sched_entity_select_rq(struct drm_sched_entity *entity);
struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 2bac478a50bf..d3e16e5d2e38 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -144,15 +144,18 @@ static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a,
static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
struct drm_sched_rq *rq)
{
+ lockdep_assert_held(&entity->lock);
+ lockdep_assert_held(&rq->lock);
+
if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
RB_CLEAR_NODE(&entity->rb_tree_node);
}
}
-void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
- struct drm_sched_rq *rq,
- ktime_t ts)
+static void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+ struct drm_sched_rq *rq,
+ ktime_t ts)
{
/*
* Both locks need to be grabbed, one to protect from entity->rq change
@@ -187,25 +190,58 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
rq->sched = sched;
}
+static ktime_t
+drm_sched_rq_get_rr_deadline(struct drm_sched_rq *rq)
+{
+ lockdep_assert_held(&rq->lock);
+
+ rq->rr_deadline = ktime_add_ns(rq->rr_deadline, 1);
+
+ return rq->rr_deadline;
+}
+
/**
* drm_sched_rq_add_entity - add an entity
*
- * @rq: scheduler run queue
* @entity: scheduler entity
+ * @ts: submission timestamp
*
* Adds a scheduler entity to the run queue.
+ *
+ * Returns a DRM scheduler pre-selected to handle this entity.
*/
-void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
- struct drm_sched_entity *entity)
+struct drm_gpu_scheduler *
+drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts)
{
- lockdep_assert_held(&entity->lock);
- lockdep_assert_held(&rq->lock);
+ struct drm_gpu_scheduler *sched;
+ struct drm_sched_rq *rq;
- if (!list_empty(&entity->list))
- return;
+ /* Add the entity to the run queue */
+ spin_lock(&entity->lock);
+ if (entity->stopped) {
+ spin_unlock(&entity->lock);
- atomic_inc(rq->sched->score);
- list_add_tail(&entity->list, &rq->entities);
+ DRM_ERROR("Trying to push to a killed entity\n");
+ return NULL;
+ }
+
+ rq = entity->rq;
+ spin_lock(&rq->lock);
+ sched = rq->sched;
+
+ if (list_empty(&entity->list)) {
+ atomic_inc(sched->score);
+ list_add_tail(&entity->list, &rq->entities);
+ }
+
+ if (drm_sched_policy == DRM_SCHED_POLICY_RR)
+ ts = drm_sched_rq_get_rr_deadline(rq);
+ drm_sched_rq_update_fifo_locked(entity, rq, ts);
+
+ spin_unlock(&rq->lock);
+ spin_unlock(&entity->lock);
+
+ return sched;
}
/**
@@ -234,6 +270,33 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
spin_unlock(&rq->lock);
}
+void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
+{
+ struct drm_sched_job *next_job;
+ struct drm_sched_rq *rq;
+ ktime_t ts;
+
+ /*
+ * Update the entity's location in the min heap according to
+ * the timestamp of the next job, if any.
+ */
+ next_job = drm_sched_entity_queue_peek(entity);
+ if (!next_job)
+ return;
+
+ if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+ ts = next_job->submit_ts;
+ else
+ ts = drm_sched_rq_get_rr_deadline(rq);
+
+ spin_lock(&entity->lock);
+ rq = entity->rq;
+ spin_lock(&rq->lock);
+ drm_sched_rq_update_fifo_locked(entity, rq, ts);
+ spin_unlock(&rq->lock);
+ spin_unlock(&entity->lock);
+}
+
/**
* drm_sched_rq_select_entity - Select an entity which provides a job to run
*
--
2.48.0
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [RFC v4 09/16] drm/sched: Move run queue related code into a separate file
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
` (7 preceding siblings ...)
2025-04-25 10:20 ` [RFC v4 08/16] drm/sched: Consolidate entity run queue management Tvrtko Ursulin
@ 2025-04-25 10:20 ` Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 10/16] drm/sched: Free all finished jobs at once Tvrtko Ursulin
` (8 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-25 10:20 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König,
Danilo Krummrich, Matthew Brost, Philipp Stanner
Lets move all the code dealing with struct drm_sched_rq into a separate
compilation unit. Advantage being sched_main.c is left with a clearer set
of responsibilities.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/Makefile | 2 +-
drivers/gpu/drm/scheduler/sched_internal.h | 7 +
drivers/gpu/drm/scheduler/sched_main.c | 210 +-------------------
drivers/gpu/drm/scheduler/sched_rq.c | 214 +++++++++++++++++++++
4 files changed, 224 insertions(+), 209 deletions(-)
create mode 100644 drivers/gpu/drm/scheduler/sched_rq.c
diff --git a/drivers/gpu/drm/scheduler/Makefile b/drivers/gpu/drm/scheduler/Makefile
index 6e13e4c63e9d..74e75eff6df5 100644
--- a/drivers/gpu/drm/scheduler/Makefile
+++ b/drivers/gpu/drm/scheduler/Makefile
@@ -20,7 +20,7 @@
# OTHER DEALINGS IN THE SOFTWARE.
#
#
-gpu-sched-y := sched_main.o sched_fence.o sched_entity.o
+gpu-sched-y := sched_main.o sched_fence.o sched_entity.o sched_rq.o
obj-$(CONFIG_DRM_SCHED) += gpu-sched.o
diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
index 8e7e477bace3..ee13a986b920 100644
--- a/drivers/gpu/drm/scheduler/sched_internal.h
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -10,8 +10,15 @@ extern int drm_sched_policy;
#define DRM_SCHED_POLICY_RR 0
#define DRM_SCHED_POLICY_FIFO 1
+bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
+ struct drm_sched_entity *entity);
void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
+void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
+ struct drm_sched_rq *rq);
+struct drm_sched_entity *
+drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
+ struct drm_sched_rq *rq);
struct drm_gpu_scheduler *
drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts);
void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index d3e16e5d2e38..8950c7705f57 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -111,8 +111,8 @@ static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
* Return true if we can push at least one more job from @entity, false
* otherwise.
*/
-static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
- struct drm_sched_entity *entity)
+bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
+ struct drm_sched_entity *entity)
{
struct drm_sched_job *s_job;
@@ -132,212 +132,6 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
return drm_sched_available_credits(sched) >= s_job->credits;
}
-static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a,
- const struct rb_node *b)
-{
- struct drm_sched_entity *ent_a = rb_entry((a), struct drm_sched_entity, rb_tree_node);
- struct drm_sched_entity *ent_b = rb_entry((b), struct drm_sched_entity, rb_tree_node);
-
- return ktime_before(ent_a->oldest_job_waiting, ent_b->oldest_job_waiting);
-}
-
-static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
- struct drm_sched_rq *rq)
-{
- lockdep_assert_held(&entity->lock);
- lockdep_assert_held(&rq->lock);
-
- if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
- rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
- RB_CLEAR_NODE(&entity->rb_tree_node);
- }
-}
-
-static void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
- struct drm_sched_rq *rq,
- ktime_t ts)
-{
- /*
- * Both locks need to be grabbed, one to protect from entity->rq change
- * for entity from within concurrent drm_sched_entity_select_rq and the
- * other to update the rb tree structure.
- */
- lockdep_assert_held(&entity->lock);
- lockdep_assert_held(&rq->lock);
-
- drm_sched_rq_remove_fifo_locked(entity, rq);
-
- entity->oldest_job_waiting = ts;
-
- rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
- drm_sched_entity_compare_before);
-}
-
-/**
- * drm_sched_rq_init - initialize a given run queue struct
- *
- * @sched: scheduler instance to associate with this run queue
- * @rq: scheduler run queue
- *
- * Initializes a scheduler runqueue.
- */
-static void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
- struct drm_sched_rq *rq)
-{
- spin_lock_init(&rq->lock);
- INIT_LIST_HEAD(&rq->entities);
- rq->rb_tree_root = RB_ROOT_CACHED;
- rq->sched = sched;
-}
-
-static ktime_t
-drm_sched_rq_get_rr_deadline(struct drm_sched_rq *rq)
-{
- lockdep_assert_held(&rq->lock);
-
- rq->rr_deadline = ktime_add_ns(rq->rr_deadline, 1);
-
- return rq->rr_deadline;
-}
-
-/**
- * drm_sched_rq_add_entity - add an entity
- *
- * @entity: scheduler entity
- * @ts: submission timestamp
- *
- * Adds a scheduler entity to the run queue.
- *
- * Returns a DRM scheduler pre-selected to handle this entity.
- */
-struct drm_gpu_scheduler *
-drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts)
-{
- struct drm_gpu_scheduler *sched;
- struct drm_sched_rq *rq;
-
- /* Add the entity to the run queue */
- spin_lock(&entity->lock);
- if (entity->stopped) {
- spin_unlock(&entity->lock);
-
- DRM_ERROR("Trying to push to a killed entity\n");
- return NULL;
- }
-
- rq = entity->rq;
- spin_lock(&rq->lock);
- sched = rq->sched;
-
- if (list_empty(&entity->list)) {
- atomic_inc(sched->score);
- list_add_tail(&entity->list, &rq->entities);
- }
-
- if (drm_sched_policy == DRM_SCHED_POLICY_RR)
- ts = drm_sched_rq_get_rr_deadline(rq);
- drm_sched_rq_update_fifo_locked(entity, rq, ts);
-
- spin_unlock(&rq->lock);
- spin_unlock(&entity->lock);
-
- return sched;
-}
-
-/**
- * drm_sched_rq_remove_entity - remove an entity
- *
- * @rq: scheduler run queue
- * @entity: scheduler entity
- *
- * Removes a scheduler entity from the run queue.
- */
-void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
- struct drm_sched_entity *entity)
-{
- lockdep_assert_held(&entity->lock);
-
- if (list_empty(&entity->list))
- return;
-
- spin_lock(&rq->lock);
-
- atomic_dec(rq->sched->score);
- list_del_init(&entity->list);
-
- drm_sched_rq_remove_fifo_locked(entity, rq);
-
- spin_unlock(&rq->lock);
-}
-
-void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
-{
- struct drm_sched_job *next_job;
- struct drm_sched_rq *rq;
- ktime_t ts;
-
- /*
- * Update the entity's location in the min heap according to
- * the timestamp of the next job, if any.
- */
- next_job = drm_sched_entity_queue_peek(entity);
- if (!next_job)
- return;
-
- if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
- ts = next_job->submit_ts;
- else
- ts = drm_sched_rq_get_rr_deadline(rq);
-
- spin_lock(&entity->lock);
- rq = entity->rq;
- spin_lock(&rq->lock);
- drm_sched_rq_update_fifo_locked(entity, rq, ts);
- spin_unlock(&rq->lock);
- spin_unlock(&entity->lock);
-}
-
-/**
- * drm_sched_rq_select_entity - Select an entity which provides a job to run
- *
- * @sched: the gpu scheduler
- * @rq: scheduler run queue to check.
- *
- * Find oldest waiting ready entity.
- *
- * Return an entity if one is found; return an error-pointer (!NULL) if an
- * entity was ready, but the scheduler had insufficient credits to accommodate
- * its job; return NULL, if no ready entity was found.
- */
-static struct drm_sched_entity *
-drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
- struct drm_sched_rq *rq)
-{
- struct rb_node *rb;
-
- spin_lock(&rq->lock);
- for (rb = rb_first_cached(&rq->rb_tree_root); rb; rb = rb_next(rb)) {
- struct drm_sched_entity *entity;
-
- entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
- if (drm_sched_entity_is_ready(entity)) {
- /* If we can't queue yet, preserve the current entity in
- * terms of fairness.
- */
- if (!drm_sched_can_queue(sched, entity)) {
- spin_unlock(&rq->lock);
- return ERR_PTR(-ENOSPC);
- }
-
- reinit_completion(&entity->entity_idle);
- break;
- }
- }
- spin_unlock(&rq->lock);
-
- return rb ? rb_entry(rb, struct drm_sched_entity, rb_tree_node) : NULL;
-}
-
/**
* drm_sched_run_job_queue - enqueue run-job work
* @sched: scheduler instance
diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
new file mode 100644
index 000000000000..d477a027feb9
--- /dev/null
+++ b/drivers/gpu/drm/scheduler/sched_rq.c
@@ -0,0 +1,214 @@
+#include <linux/rbtree.h>
+
+#include <drm/drm_print.h>
+#include <drm/gpu_scheduler.h>
+
+#include "sched_internal.h"
+
+static __always_inline bool
+drm_sched_entity_compare_before(struct rb_node *a, const struct rb_node *b)
+{
+ struct drm_sched_entity *ea =
+ rb_entry((a), struct drm_sched_entity, rb_tree_node);
+ struct drm_sched_entity *eb =
+ rb_entry((b), struct drm_sched_entity, rb_tree_node);
+
+ return ktime_before(ea->oldest_job_waiting, eb->oldest_job_waiting);
+}
+
+static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
+ struct drm_sched_rq *rq)
+{
+ lockdep_assert_held(&entity->lock);
+ lockdep_assert_held(&rq->lock);
+
+ if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
+ rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
+ RB_CLEAR_NODE(&entity->rb_tree_node);
+ }
+}
+
+static void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+ struct drm_sched_rq *rq,
+ ktime_t ts)
+{
+ /*
+ * Both locks need to be grabbed, one to protect from entity->rq change
+ * for entity from within concurrent drm_sched_entity_select_rq and the
+ * other to update the rb tree structure.
+ */
+ lockdep_assert_held(&entity->lock);
+ lockdep_assert_held(&rq->lock);
+
+ drm_sched_rq_remove_fifo_locked(entity, rq);
+
+ entity->oldest_job_waiting = ts;
+
+ rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
+ drm_sched_entity_compare_before);
+}
+
+/**
+ * drm_sched_rq_init - initialize a given run queue struct
+ *
+ * @sched: scheduler instance to associate with this run queue
+ * @rq: scheduler run queue
+ *
+ * Initializes a scheduler runqueue.
+ */
+void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
+ struct drm_sched_rq *rq)
+{
+ spin_lock_init(&rq->lock);
+ INIT_LIST_HEAD(&rq->entities);
+ rq->rb_tree_root = RB_ROOT_CACHED;
+ rq->sched = sched;
+}
+
+static ktime_t
+drm_sched_rq_get_rr_deadline(struct drm_sched_rq *rq)
+{
+ lockdep_assert_held(&rq->lock);
+
+ rq->rr_deadline = ktime_add_ns(rq->rr_deadline, 1);
+
+ return rq->rr_deadline;
+}
+
+/**
+ * drm_sched_rq_add_entity - add an entity
+ *
+ * @entity: scheduler entity
+ * @ts: submission timestamp
+ *
+ * Adds a scheduler entity to the run queue.
+ *
+ * Returns a DRM scheduler pre-selected to handle this entity.
+ */
+struct drm_gpu_scheduler *
+drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts)
+{
+ struct drm_gpu_scheduler *sched;
+ struct drm_sched_rq *rq;
+
+ /* Add the entity to the run queue */
+ spin_lock(&entity->lock);
+ if (entity->stopped) {
+ spin_unlock(&entity->lock);
+
+ DRM_ERROR("Trying to push to a killed entity\n");
+ return NULL;
+ }
+
+ rq = entity->rq;
+ spin_lock(&rq->lock);
+ sched = rq->sched;
+
+ if (list_empty(&entity->list)) {
+ atomic_inc(sched->score);
+ list_add_tail(&entity->list, &rq->entities);
+ }
+
+ if (drm_sched_policy == DRM_SCHED_POLICY_RR)
+ ts = drm_sched_rq_get_rr_deadline(rq);
+ drm_sched_rq_update_fifo_locked(entity, rq, ts);
+
+ spin_unlock(&rq->lock);
+ spin_unlock(&entity->lock);
+
+ return sched;
+}
+
+/**
+ * drm_sched_rq_remove_entity - remove an entity
+ *
+ * @rq: scheduler run queue
+ * @entity: scheduler entity
+ *
+ * Removes a scheduler entity from the run queue.
+ */
+void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
+ struct drm_sched_entity *entity)
+{
+ lockdep_assert_held(&entity->lock);
+
+ if (list_empty(&entity->list))
+ return;
+
+ spin_lock(&rq->lock);
+
+ atomic_dec(rq->sched->score);
+ list_del_init(&entity->list);
+
+ drm_sched_rq_remove_fifo_locked(entity, rq);
+
+ spin_unlock(&rq->lock);
+}
+
+void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
+{
+ struct drm_sched_job *next_job;
+ struct drm_sched_rq *rq;
+ ktime_t ts;
+
+ /*
+ * Update the entity's location in the min heap according to
+ * the timestamp of the next job, if any.
+ */
+ next_job = drm_sched_entity_queue_peek(entity);
+ if (!next_job)
+ return;
+
+ if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+ ts = next_job->submit_ts;
+ else
+ ts = drm_sched_rq_get_rr_deadline(rq);
+
+ spin_lock(&entity->lock);
+ rq = entity->rq;
+ spin_lock(&rq->lock);
+ drm_sched_rq_update_fifo_locked(entity, rq, ts);
+ spin_unlock(&rq->lock);
+ spin_unlock(&entity->lock);
+}
+
+/**
+ * drm_sched_rq_select_entity - Select an entity which provides a job to run
+ *
+ * @sched: the gpu scheduler
+ * @rq: scheduler run queue to check.
+ *
+ * Find oldest waiting ready entity.
+ *
+ * Return an entity if one is found; return an error-pointer (!NULL) if an
+ * entity was ready, but the scheduler had insufficient credits to accommodate
+ * its job; return NULL, if no ready entity was found.
+ */
+struct drm_sched_entity *
+drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
+ struct drm_sched_rq *rq)
+{
+ struct rb_node *rb;
+
+ spin_lock(&rq->lock);
+ for (rb = rb_first_cached(&rq->rb_tree_root); rb; rb = rb_next(rb)) {
+ struct drm_sched_entity *entity;
+
+ entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
+ if (drm_sched_entity_is_ready(entity)) {
+ /* If we can't queue yet, preserve the current entity in
+ * terms of fairness.
+ */
+ if (!drm_sched_can_queue(sched, entity)) {
+ spin_unlock(&rq->lock);
+ return ERR_PTR(-ENOSPC);
+ }
+
+ reinit_completion(&entity->entity_idle);
+ break;
+ }
+ }
+ spin_unlock(&rq->lock);
+
+ return rb ? rb_entry(rb, struct drm_sched_entity, rb_tree_node) : NULL;
+}
--
2.48.0
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [RFC v4 10/16] drm/sched: Free all finished jobs at once
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
` (8 preceding siblings ...)
2025-04-25 10:20 ` [RFC v4 09/16] drm/sched: Move run queue related code into a separate file Tvrtko Ursulin
@ 2025-04-25 10:20 ` Tvrtko Ursulin
2025-05-12 12:56 ` Philipp Stanner
2025-04-25 10:20 ` [RFC v4 11/16] drm/sched: Account entity GPU time Tvrtko Ursulin
` (7 subsequent siblings)
17 siblings, 1 reply; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-25 10:20 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König,
Danilo Krummrich, Matthew Brost, Philipp Stanner
To implement fair scheduling we will need as accurate as possible view
into per entity GPU time utilisation. Because sched fence execution time
are only adjusted for accuracy in the free worker we need to process
completed jobs as soon as possible so the metric is most up to date when
view from the submission side of things.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/sched_main.c | 15 ++-------------
1 file changed, 2 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 8950c7705f57..22428a1569dd 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -865,13 +865,12 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
* drm_sched_get_finished_job - fetch the next finished job to be destroyed
*
* @sched: scheduler instance
- * @have_more: are there more finished jobs on the list
*
* Returns the next finished job from the pending list (if there is one)
* ready for it to be destroyed.
*/
static struct drm_sched_job *
-drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool *have_more)
+drm_sched_get_finished_job(struct drm_gpu_scheduler *sched)
{
struct drm_sched_job *job, *next;
@@ -886,7 +885,6 @@ drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool *have_more)
/* cancel this job's TO timer */
cancel_delayed_work(&sched->work_tdr);
- *have_more = false;
next = list_first_entry_or_null(&sched->pending_list,
typeof(*next), list);
if (next) {
@@ -896,10 +894,6 @@ drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool *have_more)
next->s_fence->scheduled.timestamp =
dma_fence_timestamp(&job->s_fence->finished);
- if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
- &next->s_fence->finished.flags))
- *have_more = true;
-
/* start TO timer for next job */
drm_sched_start_timeout(sched);
}
@@ -958,14 +952,9 @@ static void drm_sched_free_job_work(struct work_struct *w)
struct drm_gpu_scheduler *sched =
container_of(w, struct drm_gpu_scheduler, work_free_job);
struct drm_sched_job *job;
- bool have_more;
- job = drm_sched_get_finished_job(sched, &have_more);
- if (job) {
+ while ((job = drm_sched_get_finished_job(sched)))
sched->ops->free_job(job);
- if (have_more)
- __drm_sched_run_free_queue(sched);
- }
drm_sched_run_job_queue(sched);
}
--
2.48.0
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [RFC v4 11/16] drm/sched: Account entity GPU time
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
` (9 preceding siblings ...)
2025-04-25 10:20 ` [RFC v4 10/16] drm/sched: Free all finished jobs at once Tvrtko Ursulin
@ 2025-04-25 10:20 ` Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 12/16] drm/sched: Remove idle entity from tree Tvrtko Ursulin
` (6 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-25 10:20 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König,
Danilo Krummrich, Matthew Brost, Philipp Stanner
To implement fair scheduling we need a view into the GPU time consumed by
entities. Problem we have is that jobs and entities objects have decoupled
lifetimes, where at the point we have a view into accurate GPU time, we
cannot link back to the entity any longer.
Solve this by adding a light weight entity stats object which is reference
counted by both entity and the job and hence can safely be used from
either side.
With that, the only other thing we need is to add a helper for adding the
job's GPU time into the respective entity stats object, and call it once
the accurate GPU time has been calculated.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/sched_entity.c | 29 ++++++++++++++++
drivers/gpu/drm/scheduler/sched_internal.h | 40 ++++++++++++++++++++++
drivers/gpu/drm/scheduler/sched_main.c | 6 +++-
include/drm/gpu_scheduler.h | 5 +++
4 files changed, 79 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 8362184fe431..6431650c3fe7 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -32,6 +32,29 @@
#include "gpu_scheduler_trace.h"
+
+void drm_sched_entity_stats_release(struct kref *kref)
+{
+ struct drm_sched_entity_stats *stats =
+ container_of(kref, typeof(*stats), kref);
+
+ kfree(stats);
+}
+
+static struct drm_sched_entity_stats *drm_sched_entity_stats_alloc(void)
+{
+ struct drm_sched_entity_stats *stats;
+
+ stats = kzalloc(sizeof(*stats), GFP_KERNEL);
+ if (!stats)
+ return NULL;
+
+ kref_init(&stats->kref);
+ spin_lock_init(&stats->lock);
+
+ return stats;
+}
+
/**
* drm_sched_entity_init - Init a context entity used by scheduler when
* submit to HW ring.
@@ -65,6 +88,11 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
return -EINVAL;
memset(entity, 0, sizeof(struct drm_sched_entity));
+
+ entity->stats = drm_sched_entity_stats_alloc();
+ if (!entity->stats)
+ return -ENOMEM;
+
INIT_LIST_HEAD(&entity->list);
entity->rq = NULL;
entity->guilty = guilty;
@@ -339,6 +367,7 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity)
dma_fence_put(rcu_dereference_check(entity->last_scheduled, true));
RCU_INIT_POINTER(entity->last_scheduled, NULL);
+ drm_sched_entity_stats_put(entity->stats);
}
EXPORT_SYMBOL(drm_sched_entity_fini);
diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
index ee13a986b920..014416cadb3e 100644
--- a/drivers/gpu/drm/scheduler/sched_internal.h
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -3,6 +3,15 @@
#ifndef _DRM_GPU_SCHEDULER_INTERNAL_H_
#define _DRM_GPU_SCHEDULER_INTERNAL_H_
+#include <linux/ktime.h>
+#include <linux/kref.h>
+#include <linux/spinlock.h>
+
+struct drm_sched_entity_stats {
+ struct kref kref;
+ spinlock_t lock;
+ ktime_t runtime;
+};
/* Used to choose between FIFO and RR job-scheduling */
extern int drm_sched_policy;
@@ -93,4 +102,35 @@ drm_sched_entity_is_ready(struct drm_sched_entity *entity)
return true;
}
+void drm_sched_entity_stats_release(struct kref *kref);
+
+static inline struct drm_sched_entity_stats *
+drm_sched_entity_stats_get(struct drm_sched_entity_stats *stats)
+{
+ kref_get(&stats->kref);
+
+ return stats;
+}
+
+static inline void
+drm_sched_entity_stats_put(struct drm_sched_entity_stats *stats)
+{
+ kref_put(&stats->kref, drm_sched_entity_stats_release);
+}
+
+static inline void
+drm_sched_entity_stats_job_add_gpu_time(struct drm_sched_job *job)
+{
+ struct drm_sched_entity_stats *stats = job->entity_stats;
+ struct drm_sched_fence *s_fence = job->s_fence;
+ ktime_t start, end;
+
+ start = dma_fence_timestamp(&s_fence->scheduled);
+ end = dma_fence_timestamp(&s_fence->finished);
+
+ spin_lock(&stats->lock);
+ stats->runtime = ktime_add(stats->runtime, ktime_sub(end, start));
+ spin_unlock(&stats->lock);
+}
+
#endif
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 22428a1569dd..e43979ad2fe1 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -615,6 +615,7 @@ void drm_sched_job_arm(struct drm_sched_job *job)
job->sched = sched;
job->s_priority = entity->priority;
job->id = atomic64_inc_return(&sched->job_id_count);
+ job->entity_stats = drm_sched_entity_stats_get(entity->stats);
drm_sched_fence_init(job->s_fence, job->entity);
}
@@ -805,6 +806,7 @@ void drm_sched_job_cleanup(struct drm_sched_job *job)
* been called.
*/
dma_fence_put(&job->s_fence->finished);
+ drm_sched_entity_stats_put(job->entity_stats);
} else {
/* The job was aborted before it has been committed to be run;
* notably, drm_sched_job_arm() has not been called.
@@ -953,8 +955,10 @@ static void drm_sched_free_job_work(struct work_struct *w)
container_of(w, struct drm_gpu_scheduler, work_free_job);
struct drm_sched_job *job;
- while ((job = drm_sched_get_finished_job(sched)))
+ while ((job = drm_sched_get_finished_job(sched))) {
+ drm_sched_entity_stats_job_add_gpu_time(job);
sched->ops->free_job(job);
+ }
drm_sched_run_job_queue(sched);
}
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 1073cc569cce..d186e7a8bb1f 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -71,6 +71,8 @@ enum drm_sched_priority {
DRM_SCHED_PRIORITY_COUNT
};
+struct drm_sched_entity_stats;
+
/**
* struct drm_sched_entity - A wrapper around a job queue (typically
* attached to the DRM file_priv).
@@ -109,6 +111,8 @@ struct drm_sched_entity {
*/
struct drm_sched_rq *rq;
+ struct drm_sched_entity_stats *stats;
+
/**
* @sched_list:
*
@@ -351,6 +355,7 @@ struct drm_sched_job {
struct drm_sched_fence *s_fence;
struct drm_sched_entity *entity;
+ struct drm_sched_entity_stats *entity_stats;
enum drm_sched_priority s_priority;
u32 credits;
--
2.48.0
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [RFC v4 12/16] drm/sched: Remove idle entity from tree
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
` (10 preceding siblings ...)
2025-04-25 10:20 ` [RFC v4 11/16] drm/sched: Account entity GPU time Tvrtko Ursulin
@ 2025-04-25 10:20 ` Tvrtko Ursulin
2025-05-12 13:03 ` Philipp Stanner
2025-04-25 10:20 ` [RFC v4 13/16] drm/sched: Add fair scheduling policy Tvrtko Ursulin
` (5 subsequent siblings)
17 siblings, 1 reply; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-25 10:20 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König,
Danilo Krummrich, Matthew Brost, Philipp Stanner
There is no need to keep entities with no jobs in the tree so lets remove
it once the last job is consumed. This keeps the tree smaller which is
nicer and more efficient as entities are removed and re-added on every
popped job.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/sched_rq.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
index d477a027feb9..2cde89cf25fb 100644
--- a/drivers/gpu/drm/scheduler/sched_rq.c
+++ b/drivers/gpu/drm/scheduler/sched_rq.c
@@ -149,25 +149,27 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
{
struct drm_sched_job *next_job;
struct drm_sched_rq *rq;
- ktime_t ts;
/*
* Update the entity's location in the min heap according to
* the timestamp of the next job, if any.
*/
+ spin_lock(&entity->lock);
+ rq = entity->rq;
+ spin_lock(&rq->lock);
next_job = drm_sched_entity_queue_peek(entity);
- if (!next_job)
- return;
+ if (next_job) {
+ ktime_t ts;
- if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
- ts = next_job->submit_ts;
- else
- ts = drm_sched_rq_get_rr_deadline(rq);
+ if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+ ts = next_job->submit_ts;
+ else
+ ts = drm_sched_rq_get_rr_deadline(rq);
- spin_lock(&entity->lock);
- rq = entity->rq;
- spin_lock(&rq->lock);
- drm_sched_rq_update_fifo_locked(entity, rq, ts);
+ drm_sched_rq_update_fifo_locked(entity, rq, ts);
+ } else {
+ drm_sched_rq_remove_fifo_locked(entity, rq);
+ }
spin_unlock(&rq->lock);
spin_unlock(&entity->lock);
}
--
2.48.0
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [RFC v4 13/16] drm/sched: Add fair scheduling policy
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
` (11 preceding siblings ...)
2025-04-25 10:20 ` [RFC v4 12/16] drm/sched: Remove idle entity from tree Tvrtko Ursulin
@ 2025-04-25 10:20 ` Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 14/16] drm/sched: Remove FIFO and RR and simplify to a single run queue Tvrtko Ursulin
` (4 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-25 10:20 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König,
Danilo Krummrich, Matthew Brost, Philipp Stanner,
Pierre-Eric Pelloux-Prayer
Fair scheduling policy is built upon the same concepts as the well known
CFS kernel scheduler - entity run queue is sorted by the virtual GPU time
consumed by entities in a way that the entity with least vruntime runs
first.
It is able to avoid total priority starvation, which is one of the
problems with FIFO, and it also eliminates the need for per priority run
queues. As it scales the actual GPU runtime by an exponential factor as
the priority decreases, therefore the virtual runtime for low priority
entities grows faster than for normal priority, pushing them further down
the runqueue order for the same real GPU time spent.
Apart from this fundamental fairness, fair policy is especially strong in
oversubscription workloads where it is able to give more GPU time to short
and bursty workloads when they are running in parallel with GPU heavy
clients submitting deep job queues.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
drivers/gpu/drm/scheduler/sched_entity.c | 28 ++++++----
drivers/gpu/drm/scheduler/sched_internal.h | 64 +++++++++++++++++++++-
drivers/gpu/drm/scheduler/sched_main.c | 14 +++--
drivers/gpu/drm/scheduler/sched_rq.c | 35 +++++++++++-
include/drm/gpu_scheduler.h | 3 +
5 files changed, 125 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 6431650c3fe7..4481d5645138 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -98,6 +98,8 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
entity->guilty = guilty;
entity->num_sched_list = num_sched_list;
entity->priority = priority;
+ entity->rq_priority = drm_sched_policy == DRM_SCHED_POLICY_FAIR ?
+ DRM_SCHED_PRIORITY_KERNEL : priority;
/*
* It's perfectly valid to initialize an entity without having a valid
* scheduler attached. It's just not valid to use the scheduler before it
@@ -114,17 +116,23 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
*/
pr_warn("%s: called with uninitialized scheduler\n", __func__);
} else if (num_sched_list) {
- /* The "priority" of an entity cannot exceed the number of run-queues of a
- * scheduler. Protect against num_rqs being 0, by converting to signed. Choose
- * the lowest priority available.
+ enum drm_sched_priority p = entity->priority;
+
+ /*
+ * The "priority" of an entity cannot exceed the number of
+ * run-queues of a scheduler. Protect against num_rqs being 0,
+ * by converting to signed. Choose the lowest priority
+ * available.
*/
- if (entity->priority >= sched_list[0]->num_rqs) {
- dev_err(sched_list[0]->dev, "entity has out-of-bounds priority: %u. num_rqs: %u\n",
- entity->priority, sched_list[0]->num_rqs);
- entity->priority = max_t(s32, (s32) sched_list[0]->num_rqs - 1,
- (s32) DRM_SCHED_PRIORITY_KERNEL);
+ if (p >= sched_list[0]->num_user_rqs) {
+ dev_err(sched_list[0]->dev, "entity with out-of-bounds priority:%u num_user_rqs:%u\n",
+ p, sched_list[0]->num_user_rqs);
+ p = max_t(s32,
+ (s32)sched_list[0]->num_user_rqs - 1,
+ (s32)DRM_SCHED_PRIORITY_KERNEL);
+ entity->priority = p;
}
- entity->rq = sched_list[0]->sched_rq[entity->priority];
+ entity->rq = sched_list[0]->sched_rq[entity->rq_priority];
}
init_completion(&entity->entity_idle);
@@ -572,7 +580,7 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
spin_lock(&entity->lock);
sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list);
- rq = sched ? sched->sched_rq[entity->priority] : NULL;
+ rq = sched ? sched->sched_rq[entity->rq_priority] : NULL;
if (rq != entity->rq) {
drm_sched_rq_remove_entity(entity->rq, entity);
entity->rq = rq;
diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
index 014416cadb3e..b675941b58a0 100644
--- a/drivers/gpu/drm/scheduler/sched_internal.h
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -11,13 +11,16 @@ struct drm_sched_entity_stats {
struct kref kref;
spinlock_t lock;
ktime_t runtime;
+ ktime_t prev_runtime;
+ u64 vruntime;
};
/* Used to choose between FIFO and RR job-scheduling */
extern int drm_sched_policy;
-#define DRM_SCHED_POLICY_RR 0
-#define DRM_SCHED_POLICY_FIFO 1
+#define DRM_SCHED_POLICY_RR 0
+#define DRM_SCHED_POLICY_FIFO 1
+#define DRM_SCHED_POLICY_FAIR 2
bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
struct drm_sched_entity *entity);
@@ -133,4 +136,61 @@ drm_sched_entity_stats_job_add_gpu_time(struct drm_sched_job *job)
spin_unlock(&stats->lock);
}
+static inline void
+drm_sched_entity_save_vruntime(struct drm_sched_entity *entity,
+ ktime_t min_vruntime)
+{
+ struct drm_sched_entity_stats *stats = entity->stats;
+
+ spin_lock(&stats->lock);
+ stats->vruntime = ktime_sub(stats->vruntime, min_vruntime);
+ spin_unlock(&stats->lock);
+}
+
+static inline ktime_t
+drm_sched_entity_restore_vruntime(struct drm_sched_entity *entity,
+ ktime_t min_vruntime)
+{
+ struct drm_sched_entity_stats *stats = entity->stats;
+ ktime_t vruntime;
+
+ spin_lock(&stats->lock);
+ vruntime = ktime_add(min_vruntime, stats->vruntime);
+ stats->vruntime = vruntime;
+ spin_unlock(&stats->lock);
+
+ return vruntime;
+}
+
+static inline ktime_t
+drm_sched_entity_update_vruntime(struct drm_sched_entity *entity)
+{
+ static const unsigned int shift[] = {
+ [DRM_SCHED_PRIORITY_KERNEL] = 1,
+ [DRM_SCHED_PRIORITY_HIGH] = 2,
+ [DRM_SCHED_PRIORITY_NORMAL] = 4,
+ [DRM_SCHED_PRIORITY_LOW] = 7,
+ };
+ struct drm_sched_entity_stats *stats = entity->stats;
+ ktime_t runtime, prev;
+
+ spin_lock(&stats->lock);
+ prev = stats->prev_runtime;
+ runtime = stats->runtime;
+ stats->prev_runtime = runtime;
+ runtime = ktime_add_ns(stats->vruntime,
+ ktime_to_ns(ktime_sub(runtime, prev)) <<
+ shift[entity->priority]);
+ stats->vruntime = runtime;
+ spin_unlock(&stats->lock);
+
+ return runtime;
+}
+
+static inline ktime_t
+drm_sched_entity_get_job_ts(struct drm_sched_entity *entity)
+{
+ return drm_sched_entity_update_vruntime(entity);
+}
+
#endif
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index e43979ad2fe1..d63c10c19f21 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -83,13 +83,13 @@
#define CREATE_TRACE_POINTS
#include "gpu_scheduler_trace.h"
-int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
+int drm_sched_policy = DRM_SCHED_POLICY_FAIR;
/**
* DOC: sched_policy (int)
* Used to override default entities scheduling policy in a run queue.
*/
-MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
+MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO, " __stringify(DRM_SCHED_POLICY_FAIR) " = Fair (default).");
module_param_named(sched_policy, drm_sched_policy, int, 0444);
static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
@@ -1082,11 +1082,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_
sched->own_submit_wq = true;
}
- sched->sched_rq = kmalloc_array(args->num_rqs, sizeof(*sched->sched_rq),
+ sched->num_user_rqs = args->num_rqs;
+ sched->num_rqs = drm_sched_policy != DRM_SCHED_POLICY_FAIR ?
+ args->num_rqs : 1;
+ sched->sched_rq = kmalloc_array(sched->num_rqs,
+ sizeof(*sched->sched_rq),
GFP_KERNEL | __GFP_ZERO);
if (!sched->sched_rq)
goto Out_check_own;
- sched->num_rqs = args->num_rqs;
+
for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
sched->sched_rq[i] = kzalloc(sizeof(*sched->sched_rq[i]), GFP_KERNEL);
if (!sched->sched_rq[i])
@@ -1201,7 +1205,7 @@ void drm_sched_increase_karma(struct drm_sched_job *bad)
if (bad->s_priority != DRM_SCHED_PRIORITY_KERNEL) {
atomic_inc(&bad->karma);
- for (i = DRM_SCHED_PRIORITY_HIGH; i < sched->num_rqs; i++) {
+ for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
struct drm_sched_rq *rq = sched->sched_rq[i];
spin_lock(&rq->lock);
diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
index 2cde89cf25fb..fd1cf89911e6 100644
--- a/drivers/gpu/drm/scheduler/sched_rq.c
+++ b/drivers/gpu/drm/scheduler/sched_rq.c
@@ -75,6 +75,23 @@ drm_sched_rq_get_rr_deadline(struct drm_sched_rq *rq)
return rq->rr_deadline;
}
+static ktime_t
+drm_sched_rq_get_min_vruntime(struct drm_sched_rq *rq)
+{
+ struct drm_sched_entity *entity;
+ struct rb_node *rb;
+
+ lockdep_assert_held(&rq->lock);
+
+ for (rb = rb_first_cached(&rq->rb_tree_root); rb; rb = rb_next(rb)) {
+ entity = rb_entry(rb, typeof(*entity), rb_tree_node);
+
+ return entity->stats->vruntime; /* Unlocked read */
+ }
+
+ return 0;
+}
+
/**
* drm_sched_rq_add_entity - add an entity
*
@@ -109,8 +126,13 @@ drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts)
list_add_tail(&entity->list, &rq->entities);
}
- if (drm_sched_policy == DRM_SCHED_POLICY_RR)
+ if (drm_sched_policy == DRM_SCHED_POLICY_FAIR) {
+ ts = drm_sched_rq_get_min_vruntime(rq);
+ ts = drm_sched_entity_restore_vruntime(entity, ts);
+ } else if (drm_sched_policy == DRM_SCHED_POLICY_RR) {
ts = drm_sched_rq_get_rr_deadline(rq);
+ }
+
drm_sched_rq_update_fifo_locked(entity, rq, ts);
spin_unlock(&rq->lock);
@@ -161,7 +183,9 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
if (next_job) {
ktime_t ts;
- if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+ if (drm_sched_policy == DRM_SCHED_POLICY_FAIR)
+ ts = drm_sched_entity_get_job_ts(entity);
+ else if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
ts = next_job->submit_ts;
else
ts = drm_sched_rq_get_rr_deadline(rq);
@@ -169,6 +193,13 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
drm_sched_rq_update_fifo_locked(entity, rq, ts);
} else {
drm_sched_rq_remove_fifo_locked(entity, rq);
+
+ if (drm_sched_policy == DRM_SCHED_POLICY_FAIR) {
+ ktime_t min_vruntime;
+
+ min_vruntime = drm_sched_rq_get_min_vruntime(rq);
+ drm_sched_entity_save_vruntime(entity, min_vruntime);
+ }
}
spin_unlock(&rq->lock);
spin_unlock(&entity->lock);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index d186e7a8bb1f..c6169cbf909b 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -151,6 +151,8 @@ struct drm_sched_entity {
*/
struct spsc_queue job_queue;
+ enum drm_sched_priority rq_priority;
+
/**
* @fence_seq:
*
@@ -556,6 +558,7 @@ struct drm_gpu_scheduler {
long timeout;
const char *name;
u32 num_rqs;
+ u32 num_user_rqs;
struct drm_sched_rq **sched_rq;
wait_queue_head_t job_scheduled;
atomic64_t job_id_count;
--
2.48.0
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [RFC v4 14/16] drm/sched: Remove FIFO and RR and simplify to a single run queue
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
` (12 preceding siblings ...)
2025-04-25 10:20 ` [RFC v4 13/16] drm/sched: Add fair scheduling policy Tvrtko Ursulin
@ 2025-04-25 10:20 ` Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 15/16] drm/sched: Queue all free credits in one worker invocation Tvrtko Ursulin
` (3 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-25 10:20 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König,
Danilo Krummrich, Matthew Brost, Philipp Stanner
If the new fair policy is at least as good as FIFO and we can afford to
remove round-robin, we can simplify the scheduler code by making the
scheduler to run queue relationship always 1:1 and remove some code.
Also, now that the FIFO policy is gone the tree of entities is not a FIFO
tree any more so rename it to just the tree.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 23 ++--
drivers/gpu/drm/scheduler/sched_entity.c | 29 +----
drivers/gpu/drm/scheduler/sched_internal.h | 9 +-
drivers/gpu/drm/scheduler/sched_main.c | 133 +++++----------------
drivers/gpu/drm/scheduler/sched_rq.c | 53 +++-----
include/drm/gpu_scheduler.h | 13 +-
6 files changed, 62 insertions(+), 198 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index acb21fc8b3ce..9440af58073b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -459,25 +459,22 @@ drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler *sched)
{
+ struct drm_sched_rq *rq = sched->rq;
+ struct drm_sched_entity *s_entity;
struct drm_sched_job *s_job;
- struct drm_sched_entity *s_entity = NULL;
- int i;
/* Signal all jobs not yet scheduled */
- for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
- struct drm_sched_rq *rq = sched->sched_rq[i];
- spin_lock(&rq->lock);
- list_for_each_entry(s_entity, &rq->entities, list) {
- while ((s_job = drm_sched_entity_queue_pop(s_entity))) {
- struct drm_sched_fence *s_fence = s_job->s_fence;
+ spin_lock(&rq->lock);
+ list_for_each_entry(s_entity, &rq->entities, list) {
+ while ((s_job = drm_sched_entity_queue_pop(s_entity))) {
+ struct drm_sched_fence *s_fence = s_job->s_fence;
- dma_fence_signal(&s_fence->scheduled);
- dma_fence_set_error(&s_fence->finished, -EHWPOISON);
- dma_fence_signal(&s_fence->finished);
- }
+ dma_fence_signal(&s_fence->scheduled);
+ dma_fence_set_error(&s_fence->finished, -EHWPOISON);
+ dma_fence_signal(&s_fence->finished);
}
- spin_unlock(&rq->lock);
}
+ spin_unlock(&rq->lock);
/* Signal all jobs already scheduled to HW */
list_for_each_entry(s_job, &sched->pending_list, list) {
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 4481d5645138..d149df2a2050 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -98,8 +98,6 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
entity->guilty = guilty;
entity->num_sched_list = num_sched_list;
entity->priority = priority;
- entity->rq_priority = drm_sched_policy == DRM_SCHED_POLICY_FAIR ?
- DRM_SCHED_PRIORITY_KERNEL : priority;
/*
* It's perfectly valid to initialize an entity without having a valid
* scheduler attached. It's just not valid to use the scheduler before it
@@ -109,30 +107,14 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
RCU_INIT_POINTER(entity->last_scheduled, NULL);
RB_CLEAR_NODE(&entity->rb_tree_node);
- if (num_sched_list && !sched_list[0]->sched_rq) {
+ if (num_sched_list && !sched_list[0]->rq) {
/* Since every entry covered by num_sched_list
* should be non-NULL and therefore we warn drivers
* not to do this and to fix their DRM calling order.
*/
pr_warn("%s: called with uninitialized scheduler\n", __func__);
} else if (num_sched_list) {
- enum drm_sched_priority p = entity->priority;
-
- /*
- * The "priority" of an entity cannot exceed the number of
- * run-queues of a scheduler. Protect against num_rqs being 0,
- * by converting to signed. Choose the lowest priority
- * available.
- */
- if (p >= sched_list[0]->num_user_rqs) {
- dev_err(sched_list[0]->dev, "entity with out-of-bounds priority:%u num_user_rqs:%u\n",
- p, sched_list[0]->num_user_rqs);
- p = max_t(s32,
- (s32)sched_list[0]->num_user_rqs - 1,
- (s32)DRM_SCHED_PRIORITY_KERNEL);
- entity->priority = p;
- }
- entity->rq = sched_list[0]->sched_rq[entity->rq_priority];
+ entity->rq = sched_list[0]->rq;
}
init_completion(&entity->entity_idle);
@@ -580,7 +562,7 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
spin_lock(&entity->lock);
sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list);
- rq = sched ? sched->sched_rq[entity->rq_priority] : NULL;
+ rq = sched ? sched->rq : NULL;
if (rq != entity->rq) {
drm_sched_rq_remove_entity(entity->rq, entity);
entity->rq = rq;
@@ -604,7 +586,6 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
{
struct drm_sched_entity *entity = sched_job->entity;
bool first;
- ktime_t submit_ts;
trace_drm_sched_job(sched_job, entity);
atomic_inc(entity->rq->sched->score);
@@ -613,16 +594,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
/*
* After the sched_job is pushed into the entity queue, it may be
* completed and freed up at any time. We can no longer access it.
- * Make sure to set the submit_ts first, to avoid a race.
*/
- sched_job->submit_ts = submit_ts = ktime_get();
first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node);
/* first job wakes up scheduler */
if (first) {
struct drm_gpu_scheduler *sched;
- sched = drm_sched_rq_add_entity(entity, submit_ts);
+ sched = drm_sched_rq_add_entity(entity);
if (sched)
drm_sched_wakeup(sched);
}
diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
index b675941b58a0..2d55f265a092 100644
--- a/drivers/gpu/drm/scheduler/sched_internal.h
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -15,13 +15,6 @@ struct drm_sched_entity_stats {
u64 vruntime;
};
-/* Used to choose between FIFO and RR job-scheduling */
-extern int drm_sched_policy;
-
-#define DRM_SCHED_POLICY_RR 0
-#define DRM_SCHED_POLICY_FIFO 1
-#define DRM_SCHED_POLICY_FAIR 2
-
bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
struct drm_sched_entity *entity);
void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
@@ -32,7 +25,7 @@ struct drm_sched_entity *
drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
struct drm_sched_rq *rq);
struct drm_gpu_scheduler *
-drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts);
+drm_sched_rq_add_entity(struct drm_sched_entity *entity);
void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
struct drm_sched_entity *entity);
void drm_sched_rq_pop_entity(struct drm_sched_entity *entity);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index d63c10c19f21..a2be0a097e75 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -83,15 +83,6 @@
#define CREATE_TRACE_POINTS
#include "gpu_scheduler_trace.h"
-int drm_sched_policy = DRM_SCHED_POLICY_FAIR;
-
-/**
- * DOC: sched_policy (int)
- * Used to override default entities scheduling policy in a run queue.
- */
-MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO, " __stringify(DRM_SCHED_POLICY_FAIR) " = Fair (default).");
-module_param_named(sched_policy, drm_sched_policy, int, 0444);
-
static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
{
u32 credits;
@@ -835,34 +826,6 @@ void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
drm_sched_run_job_queue(sched);
}
-/**
- * drm_sched_select_entity - Select next entity to process
- *
- * @sched: scheduler instance
- *
- * Return an entity to process or NULL if none are found.
- *
- * Note, that we break out of the for-loop when "entity" is non-null, which can
- * also be an error-pointer--this assures we don't process lower priority
- * run-queues. See comments in the respectively called functions.
- */
-static struct drm_sched_entity *
-drm_sched_select_entity(struct drm_gpu_scheduler *sched)
-{
- struct drm_sched_entity *entity = NULL;
- int i;
-
- /* Start with the highest priority.
- */
- for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
- entity = drm_sched_rq_select_entity(sched, sched->sched_rq[i]);
- if (entity)
- break;
- }
-
- return IS_ERR(entity) ? NULL : entity;
-}
-
/**
* drm_sched_get_finished_job - fetch the next finished job to be destroyed
*
@@ -979,8 +942,8 @@ static void drm_sched_run_job_work(struct work_struct *w)
int r;
/* Find entity with a ready job */
- entity = drm_sched_select_entity(sched);
- if (!entity)
+ entity = drm_sched_rq_select_entity(sched, sched->rq);
+ if (IS_ERR_OR_NULL(entity))
return; /* No more work */
sched_job = drm_sched_entity_pop_job(entity);
@@ -1045,8 +1008,6 @@ static struct workqueue_struct *drm_sched_alloc_wq(const char *name)
*/
int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_args *args)
{
- int i;
-
sched->ops = args->ops;
sched->credit_limit = args->credit_limit;
sched->name = args->name;
@@ -1056,13 +1017,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_
sched->score = args->score ? args->score : &sched->_score;
sched->dev = args->dev;
- if (args->num_rqs > DRM_SCHED_PRIORITY_COUNT) {
- /* This is a gross violation--tell drivers what the problem is.
- */
- dev_err(sched->dev, "%s: num_rqs cannot be greater than DRM_SCHED_PRIORITY_COUNT\n",
- __func__);
- return -EINVAL;
- } else if (sched->sched_rq) {
+ if (sched->rq) {
/* Not an error, but warn anyway so drivers can
* fine-tune their DRM calling order, and return all
* is good.
@@ -1082,21 +1037,11 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_
sched->own_submit_wq = true;
}
- sched->num_user_rqs = args->num_rqs;
- sched->num_rqs = drm_sched_policy != DRM_SCHED_POLICY_FAIR ?
- args->num_rqs : 1;
- sched->sched_rq = kmalloc_array(sched->num_rqs,
- sizeof(*sched->sched_rq),
- GFP_KERNEL | __GFP_ZERO);
- if (!sched->sched_rq)
+ sched->rq = kmalloc(sizeof(*sched->rq), GFP_KERNEL | __GFP_ZERO);
+ if (!sched->rq)
goto Out_check_own;
- for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
- sched->sched_rq[i] = kzalloc(sizeof(*sched->sched_rq[i]), GFP_KERNEL);
- if (!sched->sched_rq[i])
- goto Out_unroll;
- drm_sched_rq_init(sched, sched->sched_rq[i]);
- }
+ drm_sched_rq_init(sched, sched->rq);
init_waitqueue_head(&sched->job_scheduled);
INIT_LIST_HEAD(&sched->pending_list);
@@ -1111,12 +1056,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_
sched->ready = true;
return 0;
-Out_unroll:
- for (--i ; i >= DRM_SCHED_PRIORITY_KERNEL; i--)
- kfree(sched->sched_rq[i]);
- kfree(sched->sched_rq);
- sched->sched_rq = NULL;
Out_check_own:
if (sched->own_submit_wq)
destroy_workqueue(sched->submit_wq);
@@ -1148,25 +1088,21 @@ EXPORT_SYMBOL(drm_sched_init);
*/
void drm_sched_fini(struct drm_gpu_scheduler *sched)
{
+
+ struct drm_sched_rq *rq = sched->rq;
struct drm_sched_entity *s_entity;
- int i;
drm_sched_wqueue_stop(sched);
- for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
- struct drm_sched_rq *rq = sched->sched_rq[i];
-
- spin_lock(&rq->lock);
- list_for_each_entry(s_entity, &rq->entities, list)
- /*
- * Prevents reinsertion and marks job_queue as idle,
- * it will be removed from the rq in drm_sched_entity_fini()
- * eventually
- */
- s_entity->stopped = true;
- spin_unlock(&rq->lock);
- kfree(sched->sched_rq[i]);
- }
+ spin_lock(&rq->lock);
+ list_for_each_entry(s_entity, &rq->entities, list)
+ /*
+ * Prevents reinsertion and marks job_queue as idle,
+ * it will be removed from the rq in drm_sched_entity_fini()
+ * eventually
+ */
+ s_entity->stopped = true;
+ spin_unlock(&rq->lock);
/* Wakeup everyone stuck in drm_sched_entity_flush for this scheduler */
wake_up_all(&sched->job_scheduled);
@@ -1177,8 +1113,8 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
if (sched->own_submit_wq)
destroy_workqueue(sched->submit_wq);
sched->ready = false;
- kfree(sched->sched_rq);
- sched->sched_rq = NULL;
+ kfree(sched->rq);
+ sched->rq = NULL;
}
EXPORT_SYMBOL(drm_sched_fini);
@@ -1193,35 +1129,28 @@ EXPORT_SYMBOL(drm_sched_fini);
*/
void drm_sched_increase_karma(struct drm_sched_job *bad)
{
- int i;
- struct drm_sched_entity *tmp;
- struct drm_sched_entity *entity;
struct drm_gpu_scheduler *sched = bad->sched;
+ struct drm_sched_entity *entity, *tmp;
+ struct drm_sched_rq *rq = sched->rq;
/* don't change @bad's karma if it's from KERNEL RQ,
* because sometimes GPU hang would cause kernel jobs (like VM updating jobs)
* corrupt but keep in mind that kernel jobs always considered good.
*/
- if (bad->s_priority != DRM_SCHED_PRIORITY_KERNEL) {
- atomic_inc(&bad->karma);
+ if (bad->s_priority == DRM_SCHED_PRIORITY_KERNEL)
+ return;
- for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
- struct drm_sched_rq *rq = sched->sched_rq[i];
+ atomic_inc(&bad->karma);
- spin_lock(&rq->lock);
- list_for_each_entry_safe(entity, tmp, &rq->entities, list) {
- if (bad->s_fence->scheduled.context ==
- entity->fence_context) {
- if (entity->guilty)
- atomic_set(entity->guilty, 1);
- break;
- }
- }
- spin_unlock(&rq->lock);
- if (&entity->list != &rq->entities)
- break;
+ spin_lock(&rq->lock);
+ list_for_each_entry_safe(entity, tmp, &rq->entities, list) {
+ if (bad->s_fence->scheduled.context == entity->fence_context) {
+ if (entity->guilty)
+ atomic_set(entity->guilty, 1);
+ break;
}
}
+ spin_unlock(&rq->lock);
}
EXPORT_SYMBOL(drm_sched_increase_karma);
diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
index fd1cf89911e6..69ccdbd5af6c 100644
--- a/drivers/gpu/drm/scheduler/sched_rq.c
+++ b/drivers/gpu/drm/scheduler/sched_rq.c
@@ -16,7 +16,7 @@ drm_sched_entity_compare_before(struct rb_node *a, const struct rb_node *b)
return ktime_before(ea->oldest_job_waiting, eb->oldest_job_waiting);
}
-static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
+static void drm_sched_rq_remove_tree_locked(struct drm_sched_entity *entity,
struct drm_sched_rq *rq)
{
lockdep_assert_held(&entity->lock);
@@ -28,7 +28,7 @@ static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
}
}
-static void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+static void drm_sched_rq_update_tree_locked(struct drm_sched_entity *entity,
struct drm_sched_rq *rq,
ktime_t ts)
{
@@ -40,7 +40,7 @@ static void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
lockdep_assert_held(&entity->lock);
lockdep_assert_held(&rq->lock);
- drm_sched_rq_remove_fifo_locked(entity, rq);
+ drm_sched_rq_remove_tree_locked(entity, rq);
entity->oldest_job_waiting = ts;
@@ -65,16 +65,6 @@ void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
rq->sched = sched;
}
-static ktime_t
-drm_sched_rq_get_rr_deadline(struct drm_sched_rq *rq)
-{
- lockdep_assert_held(&rq->lock);
-
- rq->rr_deadline = ktime_add_ns(rq->rr_deadline, 1);
-
- return rq->rr_deadline;
-}
-
static ktime_t
drm_sched_rq_get_min_vruntime(struct drm_sched_rq *rq)
{
@@ -103,10 +93,11 @@ drm_sched_rq_get_min_vruntime(struct drm_sched_rq *rq)
* Returns a DRM scheduler pre-selected to handle this entity.
*/
struct drm_gpu_scheduler *
-drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts)
+drm_sched_rq_add_entity(struct drm_sched_entity *entity)
{
struct drm_gpu_scheduler *sched;
struct drm_sched_rq *rq;
+ ktime_t ts;
/* Add the entity to the run queue */
spin_lock(&entity->lock);
@@ -126,14 +117,9 @@ drm_sched_rq_add_entity(struct drm_sched_entity *entity, ktime_t ts)
list_add_tail(&entity->list, &rq->entities);
}
- if (drm_sched_policy == DRM_SCHED_POLICY_FAIR) {
- ts = drm_sched_rq_get_min_vruntime(rq);
- ts = drm_sched_entity_restore_vruntime(entity, ts);
- } else if (drm_sched_policy == DRM_SCHED_POLICY_RR) {
- ts = drm_sched_rq_get_rr_deadline(rq);
- }
-
- drm_sched_rq_update_fifo_locked(entity, rq, ts);
+ ts = drm_sched_rq_get_min_vruntime(rq);
+ ts = drm_sched_entity_restore_vruntime(entity, ts);
+ drm_sched_rq_update_tree_locked(entity, rq, ts);
spin_unlock(&rq->lock);
spin_unlock(&entity->lock);
@@ -162,7 +148,7 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
atomic_dec(rq->sched->score);
list_del_init(&entity->list);
- drm_sched_rq_remove_fifo_locked(entity, rq);
+ drm_sched_rq_remove_tree_locked(entity, rq);
spin_unlock(&rq->lock);
}
@@ -183,23 +169,14 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
if (next_job) {
ktime_t ts;
- if (drm_sched_policy == DRM_SCHED_POLICY_FAIR)
- ts = drm_sched_entity_get_job_ts(entity);
- else if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
- ts = next_job->submit_ts;
- else
- ts = drm_sched_rq_get_rr_deadline(rq);
-
- drm_sched_rq_update_fifo_locked(entity, rq, ts);
+ ts = drm_sched_entity_get_job_ts(entity);
+ drm_sched_rq_update_tree_locked(entity, rq, ts);
} else {
- drm_sched_rq_remove_fifo_locked(entity, rq);
+ ktime_t min_vruntime;
- if (drm_sched_policy == DRM_SCHED_POLICY_FAIR) {
- ktime_t min_vruntime;
-
- min_vruntime = drm_sched_rq_get_min_vruntime(rq);
- drm_sched_entity_save_vruntime(entity, min_vruntime);
- }
+ drm_sched_rq_remove_tree_locked(entity, rq);
+ min_vruntime = drm_sched_rq_get_min_vruntime(rq);
+ drm_sched_entity_save_vruntime(entity, min_vruntime);
}
spin_unlock(&rq->lock);
spin_unlock(&entity->lock);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index c6169cbf909b..e9ff24c076aa 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -151,8 +151,6 @@ struct drm_sched_entity {
*/
struct spsc_queue job_queue;
- enum drm_sched_priority rq_priority;
-
/**
* @fence_seq:
*
@@ -339,13 +337,6 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
struct drm_sched_job {
u64 id;
- /**
- * @submit_ts:
- *
- * When the job was pushed into the entity queue.
- */
- ktime_t submit_ts;
-
/**
* @sched:
*
@@ -557,9 +548,7 @@ struct drm_gpu_scheduler {
atomic_t credit_count;
long timeout;
const char *name;
- u32 num_rqs;
- u32 num_user_rqs;
- struct drm_sched_rq **sched_rq;
+ struct drm_sched_rq *rq;
wait_queue_head_t job_scheduled;
atomic64_t job_id_count;
struct workqueue_struct *submit_wq;
--
2.48.0
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [RFC v4 15/16] drm/sched: Queue all free credits in one worker invocation
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
` (13 preceding siblings ...)
2025-04-25 10:20 ` [RFC v4 14/16] drm/sched: Remove FIFO and RR and simplify to a single run queue Tvrtko Ursulin
@ 2025-04-25 10:20 ` Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 16/16] drm/sched: Embed run queue singleton into the scheduler Tvrtko Ursulin
` (2 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-25 10:20 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König,
Danilo Krummrich, Matthew Brost, Philipp Stanner
There is no reason to queue just a single job if scheduler can take more
and re-queue the worker to queue more. We can simply feed the hardware
with as much as it can take in one go and hopefully win some latency.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/sched_internal.h | 2 -
drivers/gpu/drm/scheduler/sched_main.c | 127 ++++++++++-----------
drivers/gpu/drm/scheduler/sched_rq.c | 12 +-
3 files changed, 59 insertions(+), 82 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
index 2d55f265a092..c1f523bc9379 100644
--- a/drivers/gpu/drm/scheduler/sched_internal.h
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -15,8 +15,6 @@ struct drm_sched_entity_stats {
u64 vruntime;
};
-bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
- struct drm_sched_entity *entity);
void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index a2be0a097e75..44222cfe4dc0 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -94,35 +94,6 @@ static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
return credits;
}
-/**
- * drm_sched_can_queue -- Can we queue more to the hardware?
- * @sched: scheduler instance
- * @entity: the scheduler entity
- *
- * Return true if we can push at least one more job from @entity, false
- * otherwise.
- */
-bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
- struct drm_sched_entity *entity)
-{
- struct drm_sched_job *s_job;
-
- s_job = drm_sched_entity_queue_peek(entity);
- if (!s_job)
- return false;
-
- /* If a job exceeds the credit limit, truncate it to the credit limit
- * itself to guarantee forward progress.
- */
- if (s_job->credits > sched->credit_limit) {
- dev_WARN(sched->dev,
- "Jobs may not exceed the credit limit, truncate.\n");
- s_job->credits = sched->credit_limit;
- }
-
- return drm_sched_available_credits(sched) >= s_job->credits;
-}
-
/**
* drm_sched_run_job_queue - enqueue run-job work
* @sched: scheduler instance
@@ -935,54 +906,72 @@ static void drm_sched_run_job_work(struct work_struct *w)
{
struct drm_gpu_scheduler *sched =
container_of(w, struct drm_gpu_scheduler, work_run_job);
+ u32 job_credits, submitted_credits = 0;
struct drm_sched_entity *entity;
- struct dma_fence *fence;
struct drm_sched_fence *s_fence;
struct drm_sched_job *sched_job;
- int r;
+ struct dma_fence *fence;
- /* Find entity with a ready job */
- entity = drm_sched_rq_select_entity(sched, sched->rq);
- if (IS_ERR_OR_NULL(entity))
- return; /* No more work */
+ while (!READ_ONCE(sched->pause_submit)) {
+ /* Find entity with a ready job */
+ entity = drm_sched_rq_select_entity(sched, sched->rq);
+ if (!entity)
+ break; /* No more work */
+
+ /*
+ * If a job exceeds the credit limit truncate it to guarantee
+ * forward progress.
+ */
+ sched_job = drm_sched_entity_queue_peek(entity);
+ job_credits = sched_job->credits;
+ if (dev_WARN_ONCE(sched->dev, job_credits > sched->credit_limit,
+ "Jobs may not exceed the credit limit, truncating.\n"))
+ job_credits = sched_job->credits = sched->credit_limit;
+
+ if (job_credits > drm_sched_available_credits(sched)) {
+ complete_all(&entity->entity_idle);
+ break;
+ }
+
+ sched_job = drm_sched_entity_pop_job(entity);
+ if (!sched_job) {
+ /* Top entity is not yet runnable after all */
+ complete_all(&entity->entity_idle);
+ continue;
+ }
+
+ s_fence = sched_job->s_fence;
+ drm_sched_job_begin(sched_job);
+ trace_drm_run_job(sched_job, entity);
+ submitted_credits += job_credits;
+ atomic_add(job_credits, &sched->credit_count);
+
+ fence = sched->ops->run_job(sched_job);
+ drm_sched_fence_scheduled(s_fence, fence);
+
+ if (!IS_ERR_OR_NULL(fence)) {
+ int r;
+
+ /* Drop for original kref_init of the fence */
+ dma_fence_put(fence);
+
+ r = dma_fence_add_callback(fence, &sched_job->cb,
+ drm_sched_job_done_cb);
+ if (r == -ENOENT)
+ drm_sched_job_done(sched_job, fence->error);
+ else if (r)
+ DRM_DEV_ERROR(sched->dev,
+ "fence add callback failed (%d)\n", r);
+ } else {
+ drm_sched_job_done(sched_job, IS_ERR(fence) ?
+ PTR_ERR(fence) : 0);
+ }
- sched_job = drm_sched_entity_pop_job(entity);
- if (!sched_job) {
complete_all(&entity->entity_idle);
- drm_sched_run_job_queue(sched);
- return;
}
- s_fence = sched_job->s_fence;
-
- atomic_add(sched_job->credits, &sched->credit_count);
- drm_sched_job_begin(sched_job);
-
- trace_drm_run_job(sched_job, entity);
- /*
- * The run_job() callback must by definition return a fence whose
- * refcount has been incremented for the scheduler already.
- */
- fence = sched->ops->run_job(sched_job);
- complete_all(&entity->entity_idle);
- drm_sched_fence_scheduled(s_fence, fence);
-
- if (!IS_ERR_OR_NULL(fence)) {
- r = dma_fence_add_callback(fence, &sched_job->cb,
- drm_sched_job_done_cb);
- if (r == -ENOENT)
- drm_sched_job_done(sched_job, fence->error);
- else if (r)
- DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r);
-
- dma_fence_put(fence);
- } else {
- drm_sched_job_done(sched_job, IS_ERR(fence) ?
- PTR_ERR(fence) : 0);
- }
-
- wake_up(&sched->job_scheduled);
- drm_sched_run_job_queue(sched);
+ if (submitted_credits)
+ wake_up(&sched->job_scheduled);
}
static struct workqueue_struct *drm_sched_alloc_wq(const char *name)
diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
index 69ccdbd5af6c..b18265c7f073 100644
--- a/drivers/gpu/drm/scheduler/sched_rq.c
+++ b/drivers/gpu/drm/scheduler/sched_rq.c
@@ -190,9 +190,7 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
*
* Find oldest waiting ready entity.
*
- * Return an entity if one is found; return an error-pointer (!NULL) if an
- * entity was ready, but the scheduler had insufficient credits to accommodate
- * its job; return NULL, if no ready entity was found.
+ * Return an entity if one is found or NULL if no ready entity was found.
*/
struct drm_sched_entity *
drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
@@ -206,14 +204,6 @@ drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
if (drm_sched_entity_is_ready(entity)) {
- /* If we can't queue yet, preserve the current entity in
- * terms of fairness.
- */
- if (!drm_sched_can_queue(sched, entity)) {
- spin_unlock(&rq->lock);
- return ERR_PTR(-ENOSPC);
- }
-
reinit_completion(&entity->entity_idle);
break;
}
--
2.48.0
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [RFC v4 16/16] drm/sched: Embed run queue singleton into the scheduler
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
` (14 preceding siblings ...)
2025-04-25 10:20 ` [RFC v4 15/16] drm/sched: Queue all free credits in one worker invocation Tvrtko Ursulin
@ 2025-04-25 10:20 ` Tvrtko Ursulin
2025-05-12 13:05 ` Philipp Stanner
2025-04-29 7:25 ` [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
2025-05-19 16:51 ` Pierre-Eric Pelloux-Prayer
17 siblings, 1 reply; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-25 10:20 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Tvrtko Ursulin, Christian König,
Danilo Krummrich, Matthew Brost, Philipp Stanner
Now that the run queue to scheduler relationship is always 1:1 we can
embed it (the run queue) directly in the scheduler struct and save on
some allocation error handling code and such.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 6 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 5 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 8 ++++--
drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 8 +++---
drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 8 +++---
drivers/gpu/drm/scheduler/sched_entity.c | 32 +++++++++------------
drivers/gpu/drm/scheduler/sched_fence.c | 2 +-
drivers/gpu/drm/scheduler/sched_internal.h | 6 ++--
drivers/gpu/drm/scheduler/sched_main.c | 31 ++++----------------
drivers/gpu/drm/scheduler/sched_rq.c | 18 ++++++------
include/drm/gpu_scheduler.h | 5 +---
12 files changed, 58 insertions(+), 77 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 82df06a72ee0..e18e180bf32c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1108,7 +1108,8 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p)
if (p->gang_size > 1 && !adev->vm_manager.concurrent_flush) {
for (i = 0; i < p->gang_size; ++i) {
struct drm_sched_entity *entity = p->entities[i];
- struct drm_gpu_scheduler *sched = entity->rq->sched;
+ struct drm_gpu_scheduler *sched =
+ container_of(entity->rq, typeof(*sched), rq);
struct amdgpu_ring *ring = to_amdgpu_ring(sched);
if (amdgpu_vmid_uses_reserved(vm, ring->vm_hub))
@@ -1236,7 +1237,8 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)
return r;
}
- sched = p->gang_leader->base.entity->rq->sched;
+ sched = container_of(p->gang_leader->base.entity->rq, typeof(*sched),
+ rq);
while ((fence = amdgpu_sync_get_fence(&p->sync))) {
struct drm_sched_fence *s_fence = to_drm_sched_fence(fence);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 9440af58073b..e3d4f7503738 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -359,7 +359,9 @@ static struct dma_fence *
amdgpu_job_prepare_job(struct drm_sched_job *sched_job,
struct drm_sched_entity *s_entity)
{
- struct amdgpu_ring *ring = to_amdgpu_ring(s_entity->rq->sched);
+ struct drm_gpu_scheduler *sched =
+ container_of(s_entity->rq, typeof(*sched), rq);
+ struct amdgpu_ring *ring = to_amdgpu_ring(sched);
struct amdgpu_job *job = to_amdgpu_job(sched_job);
struct dma_fence *fence;
int r;
@@ -459,7 +461,7 @@ drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler *sched)
{
- struct drm_sched_rq *rq = sched->rq;
+ struct drm_sched_rq *rq = &sched->rq;
struct drm_sched_entity *s_entity;
struct drm_sched_job *s_job;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index ce6b9ba967ff..d6872baeba1e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -85,7 +85,10 @@ struct amdgpu_job {
static inline struct amdgpu_ring *amdgpu_job_ring(struct amdgpu_job *job)
{
- return to_amdgpu_ring(job->base.entity->rq->sched);
+ struct drm_gpu_scheduler *sched =
+ container_of(job->base.entity->rq, typeof(*sched), rq);
+
+ return to_amdgpu_ring(sched);
}
int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index 11dd2e0f7979..197d20a37afb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -145,6 +145,7 @@ TRACE_EVENT(amdgpu_cs,
struct amdgpu_ib *ib),
TP_ARGS(p, job, ib),
TP_STRUCT__entry(
+ __field(struct drm_gpu_scheduler *, sched)
__field(struct amdgpu_bo_list *, bo_list)
__field(u32, ring)
__field(u32, dw)
@@ -152,11 +153,14 @@ TRACE_EVENT(amdgpu_cs,
),
TP_fast_assign(
+ __entry->sched = container_of(job->base.entity->rq,
+ typeof(*__entry->sched),
+ rq);
__entry->bo_list = p->bo_list;
- __entry->ring = to_amdgpu_ring(job->base.entity->rq->sched)->idx;
+ __entry->ring = to_amdgpu_ring(__entry->sched)->idx;
__entry->dw = ib->length_dw;
__entry->fences = amdgpu_fence_count_emitted(
- to_amdgpu_ring(job->base.entity->rq->sched));
+ to_amdgpu_ring(__entry->sched));
),
TP_printk("bo_list=%p, ring=%u, dw=%u, fences=%u",
__entry->bo_list, __entry->ring, __entry->dw,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
index 46d9fb433ab2..42f2bfb30af1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
@@ -105,13 +105,13 @@ static int amdgpu_vm_sdma_prepare(struct amdgpu_vm_update_params *p,
static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
struct dma_fence **fence)
{
+ struct drm_gpu_scheduler *sched =
+ container_of(p->vm->delayed.rq, typeof(*sched), rq);
+ struct amdgpu_ring *ring =
+ container_of(sched, struct amdgpu_ring, sched);
struct amdgpu_ib *ib = p->job->ibs;
- struct amdgpu_ring *ring;
struct dma_fence *f;
- ring = container_of(p->vm->delayed.rq->sched, struct amdgpu_ring,
- sched);
-
WARN_ON(ib->length_dw == 0);
amdgpu_ring_pad_ib(ring, ib);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
index 23b6f7a4aa4a..ab132dae8183 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
@@ -420,15 +420,15 @@ int amdgpu_xcp_open_device(struct amdgpu_device *adev,
void amdgpu_xcp_release_sched(struct amdgpu_device *adev,
struct amdgpu_ctx_entity *entity)
{
- struct drm_gpu_scheduler *sched;
- struct amdgpu_ring *ring;
+ struct drm_gpu_scheduler *sched =
+ container_of(entity->entity.rq, typeof(*sched), rq);
if (!adev->xcp_mgr)
return;
- sched = entity->entity.rq->sched;
if (drm_sched_wqueue_ready(sched)) {
- ring = to_amdgpu_ring(entity->entity.rq->sched);
+ struct amdgpu_ring *ring = to_amdgpu_ring(sched);
+
atomic_dec(&adev->xcp_mgr->xcp[ring->xcp_id].ref_cnt);
}
}
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index d149df2a2050..bc890f735552 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -104,19 +104,12 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
* is initialized itself.
*/
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
+ if (num_sched_list) {
+ entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
+ entity->rq = &sched_list[0]->rq;
+ }
RCU_INIT_POINTER(entity->last_scheduled, NULL);
RB_CLEAR_NODE(&entity->rb_tree_node);
-
- if (num_sched_list && !sched_list[0]->rq) {
- /* Since every entry covered by num_sched_list
- * should be non-NULL and therefore we warn drivers
- * not to do this and to fix their DRM calling order.
- */
- pr_warn("%s: called with uninitialized scheduler\n", __func__);
- } else if (num_sched_list) {
- entity->rq = sched_list[0]->rq;
- }
-
init_completion(&entity->entity_idle);
/* We start in an idle state. */
@@ -303,7 +296,7 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
if (!entity->rq)
return 0;
- sched = entity->rq->sched;
+ sched = container_of(entity->rq, typeof(*sched), rq);
/**
* The client will not queue more IBs during this fini, consume existing
* queued IBs or discard them on SIGKILL
@@ -395,9 +388,11 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
{
struct drm_sched_entity *entity =
container_of(cb, struct drm_sched_entity, cb);
+ struct drm_gpu_scheduler *sched =
+ container_of(entity->rq, typeof(*sched), rq);
drm_sched_entity_clear_dep(f, cb);
- drm_sched_wakeup(entity->rq->sched);
+ drm_sched_wakeup(sched);
}
/**
@@ -423,7 +418,8 @@ EXPORT_SYMBOL(drm_sched_entity_set_priority);
*/
static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
{
- struct drm_gpu_scheduler *sched = entity->rq->sched;
+ struct drm_gpu_scheduler *sched =
+ container_of(entity->rq, typeof(*sched), rq);
struct dma_fence *fence = entity->dependency;
struct drm_sched_fence *s_fence;
@@ -562,7 +558,7 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
spin_lock(&entity->lock);
sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list);
- rq = sched ? sched->rq : NULL;
+ rq = sched ? &sched->rq : NULL;
if (rq != entity->rq) {
drm_sched_rq_remove_entity(entity->rq, entity);
entity->rq = rq;
@@ -585,10 +581,12 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
{
struct drm_sched_entity *entity = sched_job->entity;
+ struct drm_gpu_scheduler *sched =
+ container_of(entity->rq, typeof(*sched), rq);
bool first;
trace_drm_sched_job(sched_job, entity);
- atomic_inc(entity->rq->sched->score);
+ atomic_inc(sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
/*
@@ -599,8 +597,6 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
/* first job wakes up scheduler */
if (first) {
- struct drm_gpu_scheduler *sched;
-
sched = drm_sched_rq_add_entity(entity);
if (sched)
drm_sched_wakeup(sched);
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
index e971528504a5..bb48e690862d 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -225,7 +225,7 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
{
unsigned seq;
- fence->sched = entity->rq->sched;
+ fence->sched = container_of(entity->rq, typeof(*fence->sched), rq);
seq = atomic_inc_return(&entity->fence_seq);
dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
&fence->lock, entity->fence_context, seq);
diff --git a/drivers/gpu/drm/scheduler/sched_internal.h b/drivers/gpu/drm/scheduler/sched_internal.h
index c1f523bc9379..df8684689962 100644
--- a/drivers/gpu/drm/scheduler/sched_internal.h
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -17,11 +17,9 @@ struct drm_sched_entity_stats {
void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
-void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
- struct drm_sched_rq *rq);
+void drm_sched_rq_init(struct drm_gpu_scheduler *sched);
struct drm_sched_entity *
-drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
- struct drm_sched_rq *rq);
+drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched);
struct drm_gpu_scheduler *
drm_sched_rq_add_entity(struct drm_sched_entity *entity);
void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 44222cfe4dc0..d2a2202dac3a 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -572,7 +572,7 @@ void drm_sched_job_arm(struct drm_sched_job *job)
BUG_ON(!entity);
drm_sched_entity_select_rq(entity);
- sched = entity->rq->sched;
+ sched = container_of(entity->rq, typeof(*sched), rq);
job->sched = sched;
job->s_priority = entity->priority;
@@ -914,7 +914,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
while (!READ_ONCE(sched->pause_submit)) {
/* Find entity with a ready job */
- entity = drm_sched_rq_select_entity(sched, sched->rq);
+ entity = drm_sched_rq_select_entity(sched);
if (!entity)
break; /* No more work */
@@ -1006,15 +1006,6 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_
sched->score = args->score ? args->score : &sched->_score;
sched->dev = args->dev;
- if (sched->rq) {
- /* Not an error, but warn anyway so drivers can
- * fine-tune their DRM calling order, and return all
- * is good.
- */
- dev_warn(sched->dev, "%s: scheduler already initialized!\n", __func__);
- return 0;
- }
-
if (args->submit_wq) {
sched->submit_wq = args->submit_wq;
sched->own_submit_wq = false;
@@ -1026,11 +1017,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_
sched->own_submit_wq = true;
}
- sched->rq = kmalloc(sizeof(*sched->rq), GFP_KERNEL | __GFP_ZERO);
- if (!sched->rq)
- goto Out_check_own;
-
- drm_sched_rq_init(sched, sched->rq);
+ drm_sched_rq_init(sched);
init_waitqueue_head(&sched->job_scheduled);
INIT_LIST_HEAD(&sched->pending_list);
@@ -1045,12 +1032,6 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_
sched->ready = true;
return 0;
-
-Out_check_own:
- if (sched->own_submit_wq)
- destroy_workqueue(sched->submit_wq);
- dev_err(sched->dev, "%s: Failed to setup GPU scheduler--out of memory\n", __func__);
- return -ENOMEM;
}
EXPORT_SYMBOL(drm_sched_init);
@@ -1078,7 +1059,7 @@ EXPORT_SYMBOL(drm_sched_init);
void drm_sched_fini(struct drm_gpu_scheduler *sched)
{
- struct drm_sched_rq *rq = sched->rq;
+ struct drm_sched_rq *rq = &sched->rq;
struct drm_sched_entity *s_entity;
drm_sched_wqueue_stop(sched);
@@ -1102,8 +1083,6 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
if (sched->own_submit_wq)
destroy_workqueue(sched->submit_wq);
sched->ready = false;
- kfree(sched->rq);
- sched->rq = NULL;
}
EXPORT_SYMBOL(drm_sched_fini);
@@ -1120,7 +1099,7 @@ void drm_sched_increase_karma(struct drm_sched_job *bad)
{
struct drm_gpu_scheduler *sched = bad->sched;
struct drm_sched_entity *entity, *tmp;
- struct drm_sched_rq *rq = sched->rq;
+ struct drm_sched_rq *rq = &sched->rq;
/* don't change @bad's karma if it's from KERNEL RQ,
* because sometimes GPU hang would cause kernel jobs (like VM updating jobs)
diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c
index b18265c7f073..f2f10f7d6ddf 100644
--- a/drivers/gpu/drm/scheduler/sched_rq.c
+++ b/drivers/gpu/drm/scheduler/sched_rq.c
@@ -52,17 +52,16 @@ static void drm_sched_rq_update_tree_locked(struct drm_sched_entity *entity,
* drm_sched_rq_init - initialize a given run queue struct
*
* @sched: scheduler instance to associate with this run queue
- * @rq: scheduler run queue
*
* Initializes a scheduler runqueue.
*/
-void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
- struct drm_sched_rq *rq)
+void drm_sched_rq_init(struct drm_gpu_scheduler *sched)
{
+ struct drm_sched_rq *rq = &sched->rq;
+
spin_lock_init(&rq->lock);
INIT_LIST_HEAD(&rq->entities);
rq->rb_tree_root = RB_ROOT_CACHED;
- rq->sched = sched;
}
static ktime_t
@@ -109,8 +108,8 @@ drm_sched_rq_add_entity(struct drm_sched_entity *entity)
}
rq = entity->rq;
+ sched = container_of(rq, typeof(*sched), rq);
spin_lock(&rq->lock);
- sched = rq->sched;
if (list_empty(&entity->list)) {
atomic_inc(sched->score);
@@ -138,6 +137,8 @@ drm_sched_rq_add_entity(struct drm_sched_entity *entity)
void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
struct drm_sched_entity *entity)
{
+ struct drm_gpu_scheduler *sched = container_of(rq, typeof(*sched), rq);
+
lockdep_assert_held(&entity->lock);
if (list_empty(&entity->list))
@@ -145,7 +146,7 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
spin_lock(&rq->lock);
- atomic_dec(rq->sched->score);
+ atomic_dec(sched->score);
list_del_init(&entity->list);
drm_sched_rq_remove_tree_locked(entity, rq);
@@ -186,16 +187,15 @@ void drm_sched_rq_pop_entity(struct drm_sched_entity *entity)
* drm_sched_rq_select_entity - Select an entity which provides a job to run
*
* @sched: the gpu scheduler
- * @rq: scheduler run queue to check.
*
* Find oldest waiting ready entity.
*
* Return an entity if one is found or NULL if no ready entity was found.
*/
struct drm_sched_entity *
-drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
- struct drm_sched_rq *rq)
+drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched)
{
+ struct drm_sched_rq *rq = &sched->rq;
struct rb_node *rb;
spin_lock(&rq->lock);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index e9ff24c076aa..fd488ccece9a 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -242,7 +242,6 @@ struct drm_sched_entity {
/**
* struct drm_sched_rq - queue of entities to be scheduled.
*
- * @sched: the scheduler to which this rq belongs to.
* @lock: protects @entities, @rb_tree_root and @rr_deadline.
* @entities: list of the entities to be scheduled.
* @rb_tree_root: root of time based priority queue of entities for FIFO scheduling
@@ -252,8 +251,6 @@ struct drm_sched_entity {
* the next entity to emit commands from.
*/
struct drm_sched_rq {
- struct drm_gpu_scheduler *sched;
-
spinlock_t lock;
/* Following members are protected by the @lock: */
ktime_t rr_deadline;
@@ -548,7 +545,7 @@ struct drm_gpu_scheduler {
atomic_t credit_count;
long timeout;
const char *name;
- struct drm_sched_rq *rq;
+ struct drm_sched_rq rq;
wait_queue_head_t job_scheduled;
atomic64_t job_id_count;
struct workqueue_struct *submit_wq;
--
2.48.0
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [RFC v4 00/16] Fair DRM scheduler
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
` (15 preceding siblings ...)
2025-04-25 10:20 ` [RFC v4 16/16] drm/sched: Embed run queue singleton into the scheduler Tvrtko Ursulin
@ 2025-04-29 7:25 ` Tvrtko Ursulin
2025-05-19 16:51 ` Pierre-Eric Pelloux-Prayer
17 siblings, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-04-29 7:25 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Leo Liu,
Matthew Brost, Philipp Stanner, Pierre-Eric Pelloux-Prayer,
Michel Dänzer
On 25/04/2025 11:20, Tvrtko Ursulin wrote:
> V4 is quite different from v3 in that I have replaced the deadline + queue-depth
> approach with a fair GPU time based approach. This is because Pierre-Eric found
> a viewperf workload which showed queue-depth based approach regressing and
> without it there was a regression on one of my synthetic workloads I was not
> happy with.
>
> In my experiments the fair scheduler looks solid so lets see how it fares after
> wider testing.
>
> On the high level main advantages of the series are:
>
> 1. Scheduling quality - schedules better than FIFO.
> 2. Code simplification - no more multiple run queues.
One important benefit which I forgot to list:
3. Enables the DRM scheduling cgroup controller
If you remember that older RFC of mine, it worked by exposing cgroup
drm.weight. And because the fair scheduler tracks per entity GPU time,
and schedules by the vruntime criteria, where vruntime = entity->scale *
runtime, it is trivial in concept to mix the scale from the group's
relative weight.
Which should make drm.weight just work for all drivers which use the DRM
scheduler and with no need to modify the drivers themselves.
I am planning to send that RFC out in due time.
Regards,
Tvrtko
> First patches add some unit tests which allow for easy evaluation of scheduling
> behaviour against different client submission patterns. From there onwards it is
> hopefully a natural progression of cleanups, enablers, adding the fair policy,
> and finally removing FIFO and RR and simplifying the code base due not more need
> for multiple run queues.
>
> As a headline result I have tested three simultaneous clients on the Steam Deck:
>
> One instance of a deferredmultisampling Vulkan demo running with low priority,
> one normal priority instance of the same demo, and the Unigine Heaven benchmark.
>
> With the FIFO scheduler we can see that the low priority client is completely
> starved and the GPU time distribution between the other two clients is uneven:
>
> https://people.igalia.com/tursulin/drm-sched-fair/fifo-starvation.png
>
> Switching to the fair scheduler, GPU time distribution is almost equal and the
> low priority client does get a small share of the GPU:
>
> https://people.igalia.com/tursulin/drm-sched-fair/fair-no-starvation.png
>
> Moving onto the synthetic submission patterns, they are about two simultaneous
> clients which broadly cover the following categories:
>
> * Deep queue clients
> * Hogs versus interactive
> * Priority handling
>
> Lets look at the results:
>
> 1. Two normal priority deep queue clients.
>
> These ones submit one second worth of 8ms jobs. As fast as they can, no
> dependencies etc. There is no difference in runtime between FIFO and fair but
> the latter allows both clients to progress with work more evenly:
>
> https://people.igalia.com/tursulin/drm-sched-fair/normal-normal.png
>
> (X axis is time, Y is submitted queue-depth, hence lowering of qd corresponds
> with work progress for both clients, tested with both schedulers separately.)
>
> 2. Same two clients but one is now low priority.
>
> https://people.igalia.com/tursulin/drm-sched-fair/normal-low.png
>
> Normal priority client is a solid line, low priority dotted. We can see how FIFO
> completely starves the low priority client until the normal priority is fully
> done. Only then the low priority client gets any GPU time.
>
> In constrast, fair scheduler allows some GPU time to the low priority client.
>
> 3. Same clients but now high versus normal priority.
>
> Similar behaviour as in the previous one with normal a bit less de-prioritised
> relative to high, than low was against normal.
>
> https://people.igalia.com/tursulin/drm-sched-fair/high-normal.png
>
> 4. Heavy load vs interactive client.
>
> Heavy client emits a 75% GPU load in the format of 3x 2.5ms jobs followed by a
> 2.5ms wait. Interactive client emits a 10% GPU load in the format of 1x 1ms job
> followed by a 9ms wait.
>
> This simulates an interactive graphical client used on top of a relatively heavy
> background load but no GPU oversubscription.
>
> Graphs show the interactive client only and from now on, instead of looking at
> the client's queue depth, we look at its "fps".
>
> https://people.igalia.com/tursulin/drm-sched-fair/heavy-interactive.png
>
> We can see that fair scheduler allows a higher fps for the interactive client
> which is good.
>
> 5. An even heavier load vs interactive client.
>
> This one is oversubscribing the GPU by submitting 4x 50ms jobs and waiting for
> only one microsecond before repeating the cycle. Interactive client is thje same
> 10% as above.
>
> https://people.igalia.com/tursulin/drm-sched-fair/veryheavy-interactive.png
>
> Here the difference is even more dramatic with fair scheduler enabling ~3x the
> framerate for the interactive client.
>
> 6. Low priority GPU hog versus heavy-interactive.
>
> Low priority client: 3x 2.5ms jobs client followed by a 0.5ms wait.
> Interactive client: 1x 0.5ms job followed by a 10ms wait.
>
> https://people.igalia.com/tursulin/drm-sched-fair/lowhog-interactive.png
>
> Slight win for the fair scheduler but could be just noise.
>
> 7. Last set of test scenarios will have three subgroups.
>
> In all cases we have two interactive (synchronous, single job at a time) clients
> with a 50% "duty cycle" GPU time usage.
>
> Client 1: 1.5ms job + 1.5ms wait (aka short bursty)
> Client 2: 2.5ms job + 2.5ms wait (aka long bursty)
>
> a) Both normal priority.
>
> https://people.igalia.com/tursulin/drm-sched-fair/5050-short.png
> https://people.igalia.com/tursulin/drm-sched-fair/5050-long.png
>
> Both schedulers favour the higher frequency duty cycle with fair giving it a
> little bit more which should be good for interactivity.
>
> b) Normal vs low priority.
>
> https://people.igalia.com/tursulin/drm-sched-fair/5050-normal-low-normal.png
> https://people.igalia.com/tursulin/drm-sched-fair/5050-normal-low-low.png
>
> Fair scheduler gives a bit more GPU time to the normal priority client which is
> again good.
>
> c) High vs normal priority.
>
> https://people.igalia.com/tursulin/drm-sched-fair/5050-high-normal-high.png
> https://people.igalia.com/tursulin/drm-sched-fair/5050-high-normal-normal.png
>
> Again, fair scheduler gives a bit more share to the higher priority client.
>
> On the overall fair looks like a potential improvement in terms of fairness,
> especially avoiding priority starvation. There do not appear to be any
> regressions with the tested workloads.
>
> As before, I am looking for feedback, ideas for what kind of submission
> scenarios to test. Testers on different GPUs would be very welcome too.
>
> And I should probably test round-robin at some point, to see if we are maybe
> okay to drop unconditionally, it or further work improving fair would be needed
> if some use cases rely on round-robin.
>
> v2:
> * Fixed many rebase errors.
> * Added some new patches.
> * Dropped single shot dependecy handling.
>
> v3:
> * Added scheduling quality unit tests.
> * Refined a tiny bit by adding some fairness.
> * Dropped a few patches for now.
>
> v4:
> * Replaced deadline with fair!
> * Refined scheduling quality unit tests.
> * Pulled one cleanup patch earlier.
> * Fixed "drm/sched: Avoid double re-lock on the job free path".
>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> CC: Leo Liu <Leo.Liu@amd.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> Cc: Michel Dänzer <michel.daenzer@mailbox.org>
>
> Tvrtko Ursulin (16):
> drm/sched: Add some scheduling quality unit tests
> drm/sched: Add some more scheduling quality unit tests
> drm/sched: De-clutter drm_sched_init
> drm/sched: Avoid double re-lock on the job free path
> drm/sched: Consolidate drm_sched_job_timedout
> drm/sched: Consolidate drm_sched_rq_select_entity_rr
> drm/sched: Implement RR via FIFO
> drm/sched: Consolidate entity run queue management
> drm/sched: Move run queue related code into a separate file
> drm/sched: Free all finished jobs at once
> drm/sched: Account entity GPU time
> drm/sched: Remove idle entity from tree
> drm/sched: Add fair scheduling policy
> drm/sched: Remove FIFO and RR and simplify to a single run queue
> drm/sched: Queue all free credits in one worker invocation
> drm/sched: Embed run queue singleton into the scheduler
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 27 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 5 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 8 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 8 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 8 +-
> drivers/gpu/drm/scheduler/Makefile | 2 +-
> drivers/gpu/drm/scheduler/sched_entity.c | 121 +--
> drivers/gpu/drm/scheduler/sched_fence.c | 2 +-
> drivers/gpu/drm/scheduler/sched_internal.h | 114 ++-
> drivers/gpu/drm/scheduler/sched_main.c | 570 +++---------
> drivers/gpu/drm/scheduler/sched_rq.c | 214 +++++
> drivers/gpu/drm/scheduler/tests/Makefile | 3 +-
> .../gpu/drm/scheduler/tests/tests_scheduler.c | 815 ++++++++++++++++++
> include/drm/gpu_scheduler.h | 23 +-
> 15 files changed, 1348 insertions(+), 578 deletions(-)
> create mode 100644 drivers/gpu/drm/scheduler/sched_rq.c
> create mode 100644 drivers/gpu/drm/scheduler/tests/tests_scheduler.c
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC v4 01/16] drm/sched: Add some scheduling quality unit tests
2025-04-25 10:20 ` [RFC v4 01/16] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
@ 2025-04-29 15:03 ` Christian König
2025-04-29 15:45 ` Michel Dänzer
0 siblings, 1 reply; 35+ messages in thread
From: Christian König @ 2025-04-29 15:03 UTC (permalink / raw)
To: Tvrtko Ursulin, amd-gfx, dri-devel
Cc: kernel-dev, Danilo Krummrich, Matthew Brost, Philipp Stanner,
Pierre-Eric Pelloux-Prayer
On 4/25/25 12:20, Tvrtko Ursulin wrote:
> To make evaluating different scheduling policies easier (no need for
> external benchmarks) and perfectly repetable, lets add some synthetic
Typo "repeatable".
> workloads built upon mock scheduler unit test infrastructure.
>
> Focus is on two parallel clients (two threads) submitting different job
> patterns and logging their progress and some overall metrics. This is
> repeated for both scheduler credit limit 1 and 2.
>
> Example test output:
>
> Normal and low:
> pct1 cps1 qd1; pct2 cps2 qd2
> + 0ms: 0 0 0; 0 0 0
> + 104ms: 100 1240 112; 100 1240 125
> + 209ms: 100 0 99; 100 0 125
> + 313ms: 100 0 86; 100 0 125
> + 419ms: 100 0 73; 100 0 125
> + 524ms: 100 0 60; 100 0 125
> + 628ms: 100 0 47; 100 0 125
> + 731ms: 100 0 34; 100 0 125
> + 836ms: 100 0 21; 100 0 125
> + 939ms: 100 0 8; 100 0 125
> + 1043ms: ; 100 0 120
> + 1147ms: ; 100 0 107
> + 1252ms: ; 100 0 94
> + 1355ms: ; 100 0 81
> + 1459ms: ; 100 0 68
> + 1563ms: ; 100 0 55
> + 1667ms: ; 100 0 42
> + 1771ms: ; 100 0 29
> + 1875ms: ; 100 0 16
> + 1979ms: ; 100 0 3
> 0: prio=normal sync=0 elapsed_ms=1015ms (ideal_ms=1000ms) cycle_time(min,avg,max)=134,222,978 us latency_time(min,avg,max)=134,222,978
> us
> 1: prio=low sync=0 elapsed_ms=2009ms (ideal_ms=1000ms) cycle_time(min,avg,max)=134,215,806 us latency_time(min,avg,max)=134,215,806 us
>
> There we have two clients represented in the two respective columns, with
> their progress logged roughly every 100 milliseconds. The metrics are:
>
> - pct - Percentage progress of the job submit part
> - cps - Cycles per second
> - qd - Queue depth - number of submitted unfinished jobs
>
> The cycles per second metric is inherent to the fact that workload
> patterns are a data driven cycling sequence of:
>
> - Submit 1..N jobs
> - Wait for Nth job to finish (optional)
> - Sleep (optional)
> - Repeat from start
>
> In this particular example we have a normal priority and a low priority
> clients both spamming the scheduler with 8ms jobs with no sync and no
> sleeping. Hence they build a very deep queues and we can see how the low
> priority client is completely starved until the normal finishes.
>
> Note that the PCT and CPS metrics are irrelevant for "unsync" clients
> since they manage to complete all of their cycles instantenuously.
Typo instantanuously.
>
> A different example would be:
>
> Heavy and interactive:
> pct1 cps1 qd1; pct2 cps2 qd2
> + 0ms: 0 0 0; 0 0 0
> + 106ms: 5 40 3; 5 40 0
> + 209ms: 9 40 0; 9 40 0
> + 314ms: 14 50 3; 14 50 0
> + 417ms: 18 40 0; 18 40 0
> + 522ms: 23 50 3; 23 50 0
> + 625ms: 27 40 0; 27 40 1
> + 729ms: 32 50 0; 32 50 0
> + 833ms: 36 40 1; 36 40 0
> + 937ms: 40 40 0; 40 40 0
> + 1041ms: 45 50 0; 45 50 0
> + 1146ms: 49 40 1; 49 40 1
> + 1249ms: 54 50 0; 54 50 0
> + 1353ms: 58 40 1; 58 40 0
> + 1457ms: 62 40 0; 62 40 1
> + 1561ms: 67 50 0; 67 50 0
> + 1665ms: 71 40 1; 71 40 0
> + 1772ms: 76 50 0; 76 50 0
> + 1877ms: 80 40 1; 80 40 0
> + 1981ms: 84 40 0; 84 40 0
> + 2085ms: 89 50 0; 89 50 0
> + 2189ms: 93 40 1; 93 40 0
> + 2293ms: 97 40 0; 97 40 1
>
> In this case client one is submitting 3x 2.5ms jobs, waiting for the 3rd
> and then sleeping for 2.5ms (in effect causing 75% GPU load, minus the
> overheads). Second client is submitting 1ms jobs, waiting for each to
> finish and sleeping for 9ms (effective 10% GPU load). Here we can see
> the PCT and CPS reflecting real progress.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
I only skimmed over it, but it looks like exactly what we need.
Feel free to add Acked-by: Christian König <christian.koenig@amd.com>
Regards,
Christian.
> ---
> drivers/gpu/drm/scheduler/tests/Makefile | 3 +-
> .../gpu/drm/scheduler/tests/tests_scheduler.c | 631 ++++++++++++++++++
> 2 files changed, 633 insertions(+), 1 deletion(-)
> create mode 100644 drivers/gpu/drm/scheduler/tests/tests_scheduler.c
>
> diff --git a/drivers/gpu/drm/scheduler/tests/Makefile b/drivers/gpu/drm/scheduler/tests/Makefile
> index 5bf707bad373..9ec185fbbc15 100644
> --- a/drivers/gpu/drm/scheduler/tests/Makefile
> +++ b/drivers/gpu/drm/scheduler/tests/Makefile
> @@ -2,6 +2,7 @@
>
> drm-sched-tests-y := \
> mock_scheduler.o \
> - tests_basic.o
> + tests_basic.o \
> + tests_scheduler.o
>
> obj-$(CONFIG_DRM_SCHED_KUNIT_TEST) += drm-sched-tests.o
> diff --git a/drivers/gpu/drm/scheduler/tests/tests_scheduler.c b/drivers/gpu/drm/scheduler/tests/tests_scheduler.c
> new file mode 100644
> index 000000000000..b66321ef7abe
> --- /dev/null
> +++ b/drivers/gpu/drm/scheduler/tests/tests_scheduler.c
> @@ -0,0 +1,631 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2025 Valve Corporation */
> +
> +#include <linux/delay.h>
> +#include <linux/kthread.h>
> +#include <linux/ktime.h>
> +
> +#include "sched_tests.h"
> +
> +/*
> + * DRM scheduler scheduler tests exercise load balancing decisions ie. entity
> + * selection logic.
> + */
> +
> +static int drm_sched_scheduler_init(struct kunit *test)
> +{
> + struct drm_mock_scheduler *sched;
> +
> + sched = drm_mock_sched_new(test, MAX_SCHEDULE_TIMEOUT);
> + sched->base.credit_limit = 1;
> +
> + test->priv = sched;
> +
> + return 0;
> +}
> +
> +static int drm_sched_scheduler_init2(struct kunit *test)
> +{
> + struct drm_mock_scheduler *sched;
> +
> + sched = drm_mock_sched_new(test, MAX_SCHEDULE_TIMEOUT);
> + sched->base.credit_limit = 2;
> +
> + test->priv = sched;
> +
> + return 0;
> +}
> +
> +static void drm_sched_scheduler_exit(struct kunit *test)
> +{
> + struct drm_mock_scheduler *sched = test->priv;
> +
> + drm_mock_sched_fini(sched);
> +}
> +
> +static void drm_sched_scheduler_queue_overhead(struct kunit *test)
> +{
> + struct drm_mock_scheduler *sched = test->priv;
> + struct drm_mock_sched_entity *entity;
> + const unsigned int job_us = 1000;
> + const unsigned int jobs = 1000;
> + const unsigned int total_us = jobs * job_us;
> + struct drm_mock_sched_job *job, *first;
> + ktime_t start, end;
> + bool done;
> + int i;
> +
> + /*
> + * Deep queue job at a time processing (single credit).
> + *
> + * This measures the overhead of picking and processing a job at a time
> + * by comparing the ideal total "GPU" time of all submitted jobs versus
> + * the time actually taken.
> + */
> +
> + KUNIT_ASSERT_EQ(test, sched->base.credit_limit, 1);
> +
> + entity = drm_mock_sched_entity_new(test,
> + DRM_SCHED_PRIORITY_NORMAL,
> + sched);
> +
> + for (i = 0; i <= jobs; i++) {
> + job = drm_mock_sched_job_new(test, entity);
> + if (i == 0)
> + first = job; /* Extra first job blocks the queue */
> + else
> + drm_mock_sched_job_set_duration_us(job, job_us);
> + drm_mock_sched_job_submit(job);
> + }
> +
> + done = drm_mock_sched_job_wait_scheduled(first, HZ);
> + KUNIT_ASSERT_TRUE(test, done);
> +
> + start = ktime_get();
> + i = drm_mock_sched_advance(sched, 1); /* Release the queue */
> + KUNIT_ASSERT_EQ(test, i, 1);
> +
> + done = drm_mock_sched_job_wait_finished(job,
> + usecs_to_jiffies(total_us) * 5);
> + end = ktime_get();
> + KUNIT_ASSERT_TRUE(test, done);
> +
> + pr_info("Expected %uus, actual %lldus\n",
> + total_us,
> + ktime_to_us(ktime_sub(end, start)));
> +
> + drm_mock_sched_entity_free(entity);
> +}
> +
> +static void drm_sched_scheduler_ping_pong(struct kunit *test)
> +{
> + struct drm_mock_sched_job *job, *first, *prev = NULL;
> + struct drm_mock_scheduler *sched = test->priv;
> + struct drm_mock_sched_entity *entity[2];
> + const unsigned int job_us = 1000;
> + const unsigned int jobs = 1000;
> + const unsigned int total_us = jobs * job_us;
> + ktime_t start, end;
> + bool done;
> + int i;
> +
> + /*
> + * Two entitites in inter-dependency chain.
> + *
> + * This measures the overhead of picking and processing a job at a time,
> + * where each job depends on the previous one from the diffferent
> + * entity, by comparing the ideal total "GPU" time of all submitted jobs
> + * versus the time actually taken.
> + */
> +
> + KUNIT_ASSERT_EQ(test, sched->base.credit_limit, 1);
> +
> + for (i = 0; i < ARRAY_SIZE(entity); i++)
> + entity[i] = drm_mock_sched_entity_new(test,
> + DRM_SCHED_PRIORITY_NORMAL,
> + sched);
> +
> + for (i = 0; i <= jobs; i++) {
> + job = drm_mock_sched_job_new(test, entity[i & 1]);
> + if (i == 0)
> + first = job; /* Extra first job blocks the queue */
> + else
> + drm_mock_sched_job_set_duration_us(job, job_us);
> + if (prev)
> + drm_sched_job_add_dependency(&job->base,
> + dma_fence_get(&prev->base.s_fence->finished));
> + drm_mock_sched_job_submit(job);
> + prev = job;
> + }
> +
> + done = drm_mock_sched_job_wait_scheduled(first, HZ);
> + KUNIT_ASSERT_TRUE(test, done);
> +
> + start = ktime_get();
> + i = drm_mock_sched_advance(sched, 1); /* Release the queue */
> + KUNIT_ASSERT_EQ(test, i, 1);
> +
> + done = drm_mock_sched_job_wait_finished(job,
> + usecs_to_jiffies(total_us) * 5);
> + end = ktime_get();
> + KUNIT_ASSERT_TRUE(test, done);
> +
> + pr_info("Expected %uus, actual %lldus\n",
> + total_us,
> + ktime_to_us(ktime_sub(end, start)));
> +
> + for (i = 0; i < ARRAY_SIZE(entity); i++)
> + drm_mock_sched_entity_free(entity[i]);
> +}
> +
> +static struct kunit_case drm_sched_scheduler_overhead_tests[] = {
> + KUNIT_CASE_SLOW(drm_sched_scheduler_queue_overhead),
> + KUNIT_CASE_SLOW(drm_sched_scheduler_ping_pong),
> + {}
> +};
> +
> +static struct kunit_suite drm_sched_scheduler_overhead = {
> + .name = "drm_sched_scheduler_overhead_tests",
> + .init = drm_sched_scheduler_init,
> + .exit = drm_sched_scheduler_exit,
> + .test_cases = drm_sched_scheduler_overhead_tests,
> +};
> +
> +struct drm_sched_client_params {
> + enum drm_sched_priority priority;
> + unsigned int job_cnt;
> + unsigned int job_us;
> + unsigned int wait_us;
> + bool sync;
> +};
> +
> +struct drm_sched_test_params {
> + const char *description;
> + struct drm_sched_client_params client[2];
> +};
> +
> +static const struct drm_sched_test_params drm_sched_cases[] = {
> + {
> + .description = "Normal and normal",
> + .client[0] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 1,
> + .job_us = 8000,
> + .wait_us = 0,
> + .sync = false,
> + },
> + .client[1] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 1,
> + .job_us = 8000,
> + .wait_us = 0,
> + .sync = false,
> + },
> + },
> + {
> + .description = "Normal and low",
> + .client[0] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 1,
> + .job_us = 8000,
> + .wait_us = 0,
> + .sync = false,
> + },
> + .client[1] = {
> + .priority = DRM_SCHED_PRIORITY_LOW,
> + .job_cnt = 1,
> + .job_us = 8000,
> + .wait_us = 0,
> + .sync = false,
> + },
> + },
> + {
> + .description = "High and normal",
> + .client[0] = {
> + .priority = DRM_SCHED_PRIORITY_HIGH,
> + .job_cnt = 1,
> + .job_us = 8000,
> + .wait_us = 0,
> + .sync = false,
> + },
> + .client[1] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 1,
> + .job_us = 8000,
> + .wait_us = 0,
> + .sync = false,
> + },
> + },
> + {
> + .description = "High and low",
> + .client[0] = {
> + .priority = DRM_SCHED_PRIORITY_HIGH,
> + .job_cnt = 1,
> + .job_us = 8000,
> + .wait_us = 0,
> + .sync = false,
> + },
> + .client[1] = {
> + .priority = DRM_SCHED_PRIORITY_LOW,
> + .job_cnt = 1,
> + .job_us = 8000,
> + .wait_us = 0,
> + .sync = false,
> + },
> + },
> + {
> + .description = "50 and 50",
> + .client[0] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 1,
> + .job_us = 1500,
> + .wait_us = 1500,
> + .sync = true,
> + },
> + .client[1] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 1,
> + .job_us = 2500,
> + .wait_us = 2500,
> + .sync = true,
> + },
> + },
> + {
> + .description = "50 and 50 low",
> + .client[0] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 1,
> + .job_us = 1500,
> + .wait_us = 1500,
> + .sync = true,
> + },
> + .client[1] = {
> + .priority = DRM_SCHED_PRIORITY_LOW,
> + .job_cnt = 1,
> + .job_us = 2500,
> + .wait_us = 2500,
> + .sync = true,
> + },
> + },
> + {
> + .description = "50 high and 50",
> + .client[0] = {
> + .priority = DRM_SCHED_PRIORITY_HIGH,
> + .job_cnt = 1,
> + .job_us = 1500,
> + .wait_us = 1500,
> + .sync = true,
> + },
> + .client[1] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 1,
> + .job_us = 2500,
> + .wait_us = 2500,
> + .sync = true,
> + },
> + },
> + {
> + .description = "Low hog and interactive",
> + .client[0] = {
> + .priority = DRM_SCHED_PRIORITY_LOW,
> + .job_cnt = 3,
> + .job_us = 2500,
> + .wait_us = 500,
> + .sync = false,
> + },
> + .client[1] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 1,
> + .job_us = 500,
> + .wait_us = 10000,
> + .sync = true,
> + },
> + },
> + {
> + .description = "Heavy and interactive",
> + .client[0] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 3,
> + .job_us = 2500,
> + .wait_us = 2500,
> + .sync = true,
> + },
> + .client[1] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 1,
> + .job_us = 1000,
> + .wait_us = 9000,
> + .sync = true,
> + },
> + },
> + {
> + .description = "Very heavy and interactive",
> + .client[0] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 4,
> + .job_us = 50000,
> + .wait_us = 1,
> + .sync = true,
> + },
> + .client[1] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 1,
> + .job_us = 1000,
> + .wait_us = 9000,
> + .sync = true,
> + },
> + },
> +};
> +
> +static void
> +drm_sched_desc(const struct drm_sched_test_params *params, char *desc)
> +{
> + strscpy(desc, params->description, KUNIT_PARAM_DESC_SIZE);
> +}
> +
> +KUNIT_ARRAY_PARAM(drm_sched_scheduler_two_clients,
> + drm_sched_cases,
> + drm_sched_desc);
> +
> +struct test_client_stats {
> + unsigned long min_us;
> + unsigned long max_us;
> + unsigned long avg_us;
> +};
> +
> +struct test_client {
> + struct kunit *test;
> +
> + struct drm_mock_sched_entity *entity;
> +
> + struct kthread_worker *worker;
> + struct kthread_work work;
> +
> + unsigned int id;
> + ktime_t duration;
> +
> + struct drm_sched_client_params params;
> +
> + ktime_t ideal_duration;
> + unsigned int cycles;
> + unsigned int cycle;
> + ktime_t start;
> + ktime_t end;
> + bool done;
> +
> + struct test_client_stats cycle_time;
> + struct test_client_stats latency_time;
> +};
> +
> +static void
> +update_stats(struct test_client_stats *stats, unsigned int n, unsigned long us)
> +{
> + if (us > stats->max_us)
> + stats->max_us = us;
> + if (us < stats->min_us)
> + stats->min_us = us;
> + stats->avg_us = DIV_ROUND_UP(n * stats->avg_us + us, n + 1);
> +}
> +
> +static void drm_sched_client_work(struct kthread_work *work)
> +{
> + struct test_client *client = container_of(work, typeof(*client), work);
> + const long sync_wait = MAX_SCHEDULE_TIMEOUT;
> + unsigned int cycle, work_us, period_us;
> + struct drm_mock_sched_job *job = NULL;
> +
> + work_us = client->params.job_cnt * client->params.job_us;
> + period_us = work_us + client->params.wait_us;
> + client->cycles = DIV_ROUND_UP(ktime_to_us(client->duration), period_us);
> + client->ideal_duration = us_to_ktime(client->cycles * period_us);
> +
> + client->start = ktime_get();
> +
> + for (cycle = 0; cycle < client->cycles; cycle++) {
> + unsigned int batch;
> + unsigned long us;
> + ktime_t t;
> +
> + if (READ_ONCE(client->done))
> + break;
> +
> + t = ktime_get();
> + for (batch = 0; batch < client->params.job_cnt; batch++) {
> + job = drm_mock_sched_job_new(client->test,
> + client->entity);
> + drm_mock_sched_job_set_duration_us(job,
> + client->params.job_us);
> + drm_mock_sched_job_submit(job);
> + }
> +
> + if (client->params.sync)
> + drm_mock_sched_job_wait_finished(job, sync_wait);
> +
> + t = ktime_sub(ktime_get(), t);
> + us = ktime_to_us(t);
> + update_stats(&client->cycle_time, cycle, us);
> + if (ktime_to_us(t) >= (long)work_us)
> + us = ktime_to_us(t) - work_us;
> + else if (WARN_ON_ONCE(client->params.sync))
> + us = 0;
> + update_stats(&client->latency_time, cycle, us);
> + WRITE_ONCE(client->cycle, cycle);
> +
> + if (READ_ONCE(client->done))
> + break;
> +
> + if (client->params.wait_us)
> + fsleep(client->params.wait_us);
> + else
> + cond_resched();
> + }
> +
> + client->done = drm_mock_sched_job_wait_finished(job, sync_wait);
> + client->end = ktime_get();
> +}
> +
> +static const char *prio_str(enum drm_sched_priority prio)
> +{
> + switch (prio) {
> + case DRM_SCHED_PRIORITY_KERNEL:
> + return "kernel";
> + case DRM_SCHED_PRIORITY_LOW:
> + return "low";
> + case DRM_SCHED_PRIORITY_NORMAL:
> + return "normal";
> + case DRM_SCHED_PRIORITY_HIGH:
> + return "high";
> + default:
> + return "???";
> + }
> +}
> +
> +static void drm_sched_scheduler_two_clients_test(struct kunit *test)
> +{
> + const struct drm_sched_test_params *params = test->param_value;
> + struct drm_mock_scheduler *sched = test->priv;
> + struct test_client client[2] = { };
> + unsigned int prev_cycle[2] = { };
> + unsigned int i, j;
> + ktime_t start;
> +
> + /*
> + * Same job stream from from two clients.
> + */
> +
> + for (i = 0; i < ARRAY_SIZE(client); i++)
> + client[i].entity =
> + drm_mock_sched_entity_new(test,
> + params->client[i].priority,
> + sched);
> +
> + for (i = 0; i < ARRAY_SIZE(client); i++) {
> + client[i].test = test;
> + client[i].id = i;
> + client[i].duration = ms_to_ktime(1000);
> + client[i].params = params->client[i];
> + client[i].cycle_time.min_us = ~0UL;
> + client[i].latency_time.min_us = ~0UL;
> + client[i].worker =
> + kthread_create_worker(0, "%s-%u", __func__, i);
> + if (IS_ERR(client[i].worker)) {
> + for (j = 0; j < i; j++)
> + kthread_destroy_worker(client[j].worker);
> + KUNIT_FAIL(test, "Failed to create worker!\n");
> + }
> +
> + kthread_init_work(&client[i].work, drm_sched_client_work);
> + }
> +
> + for (i = 0; i < ARRAY_SIZE(client); i++)
> + kthread_queue_work(client[i].worker, &client[i].work);
> +
> + /*
> + * The clients (workers) can be a mix of async (deep submission queue),
> + * sync (one job at a time), or something in between. Therefore it is
> + * difficult to display a single metric representing their progress.
> + *
> + * Each struct drm_sched_client_params describes the actual submission
> + * pattern which happens in the following steps:
> + * 1. Submit N jobs
> + * 2. Wait for last submitted job to finish
> + * 3. Sleep for U micro-seconds
> + * 4. Goto 1. for C cycles
> + *
> + * Where number of cycles is calculated to match the target client
> + * duration from the respective struct drm_sched_test_params.
> + *
> + * To asses scheduling behaviour what we output for both clients is:
> + * - pct: Percentage progress of the jobs submitted
> + * - cps: "Cycles" per second (where one cycle is one 1.-4. above)
> + * - qd: Number of outstanding jobs in the client/entity
> + */
> +
> + start = ktime_get();
> + pr_info("%s:\n\t pct1 cps1 qd1; pct2 cps2 qd2\n",
> + params->description);
> + while (!READ_ONCE(client[0].done) || !READ_ONCE(client[1].done)) {
> + unsigned int pct[2], qd[2], cycle[2], cps[2];
> +
> + for (i = 0; i < ARRAY_SIZE(client); i++) {
> + qd[i] = spsc_queue_count(&client[i].entity->base.job_queue);
> + cycle[i] = READ_ONCE(client[i].cycle);
> + cps[i] = DIV_ROUND_UP(1000 * (cycle[i] - prev_cycle[i]),
> + 100);
> + if (client[i].cycles)
> + pct[i] = DIV_ROUND_UP(100 * (1 + cycle[i]),
> + client[i].cycles);
> + else
> + pct[i] = 0;
> + prev_cycle[i] = cycle[i];
> + }
> +
> + if (READ_ONCE(client[0].done))
> + pr_info("\t+%6lldms: ; %3u %5u %4u\n",
> + ktime_to_ms(ktime_sub(ktime_get(), start)),
> + pct[1], cps[1], qd[1]);
> + else if (READ_ONCE(client[1].done))
> + pr_info("\t+%6lldms: %3u %5u %4u;\n",
> + ktime_to_ms(ktime_sub(ktime_get(), start)),
> + pct[0], cps[0], qd[0]);
> + else
> + pr_info("\t+%6lldms: %3u %5u %4u; %3u %5u %4u\n",
> + ktime_to_ms(ktime_sub(ktime_get(), start)),
> + pct[0], cps[0], qd[0],
> + pct[1], cps[1], qd[1]);
> + msleep(100);
> + }
> +
> + for (i = 0; i < ARRAY_SIZE(client); i++) {
> + kthread_flush_work(&client[i].work);
> + kthread_destroy_worker(client[i].worker);
> + }
> +
> + for (i = 0; i < ARRAY_SIZE(client); i++)
> + KUNIT_ASSERT_TRUE(test, client[i].done);
> +
> + for (i = 0; i < ARRAY_SIZE(client); i++) {
> + pr_info(" %u: prio=%s sync=%u elapsed_ms=%lldms (ideal_ms=%lldms) cycle_time(min,avg,max)=%lu,%lu,%lu us latency_time(min,avg,max)=%lu,%lu,%lu us",
> + i,
> + prio_str(params->client[i].priority),
> + params->client[i].sync,
> + ktime_to_ms(ktime_sub(client[i].end, client[i].start)),
> + ktime_to_ms(client[i].ideal_duration),
> + client[i].cycle_time.min_us,
> + client[i].cycle_time.avg_us,
> + client[i].cycle_time.max_us,
> + client[i].latency_time.min_us,
> + client[i].latency_time.avg_us,
> + client[i].latency_time.max_us);
> + drm_mock_sched_entity_free(client[i].entity);
> + }
> +}
> +
> +static const struct kunit_attributes drm_sched_scheduler_two_clients_attr = {
> + .speed = KUNIT_SPEED_SLOW,
> +};
> +
> +static struct kunit_case drm_sched_scheduler_two_clients_tests[] = {
> + KUNIT_CASE_PARAM_ATTR(drm_sched_scheduler_two_clients_test,
> + drm_sched_scheduler_two_clients_gen_params,
> + drm_sched_scheduler_two_clients_attr),
> + {}
> +};
> +
> +static struct kunit_suite drm_sched_scheduler_two_clients1 = {
> + .name = "drm_sched_scheduler_two_clients_one_credit_tests",
> + .init = drm_sched_scheduler_init,
> + .exit = drm_sched_scheduler_exit,
> + .test_cases = drm_sched_scheduler_two_clients_tests,
> +};
> +
> +static struct kunit_suite drm_sched_scheduler_two_clients2 = {
> + .name = "drm_sched_scheduler_two_clients_two_credits_tests",
> + .init = drm_sched_scheduler_init2,
> + .exit = drm_sched_scheduler_exit,
> + .test_cases = drm_sched_scheduler_two_clients_tests,
> +};
> +
> +kunit_test_suites(&drm_sched_scheduler_overhead,
> + &drm_sched_scheduler_two_clients1,
> + &drm_sched_scheduler_two_clients2);
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC v4 02/16] drm/sched: Add some more scheduling quality unit tests
2025-04-25 10:20 ` [RFC v4 02/16] drm/sched: Add some more " Tvrtko Ursulin
@ 2025-04-29 15:07 ` Christian König
0 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2025-04-29 15:07 UTC (permalink / raw)
To: Tvrtko Ursulin, amd-gfx, dri-devel
Cc: kernel-dev, Danilo Krummrich, Matthew Brost, Philipp Stanner,
Pierre-Eric Pelloux-Prayer
On 4/25/25 12:20, Tvrtko Ursulin wrote:
> This time round we explore the rate of submitted job queue processing
> with multiple identical parallel clients.
>
> Example test output:
>
> 3 clients:
> t cycle: min avg max : ...
> + 0ms 0 0 0 : 0 0 0
> + 102ms 2 2 2 : 2 2 2
> + 208ms 5 6 6 : 6 5 5
> + 310ms 8 9 9 : 9 9 8
> ...
> + 2616ms 82 83 83 : 83 83 82
> + 2717ms 83 83 83 : 83 83 83
> avg_max_min_delta(x100)=60
>
> Every 100ms for the duration of the test test logs how many jobs each
> client had completed, prefixed by minimum, average and maximum numbers.
> When finished overall average delta between max and min is output as a
> rough indicator to scheduling fairness.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
> ---
> .../gpu/drm/scheduler/tests/tests_scheduler.c | 186 +++++++++++++++++-
> 1 file changed, 185 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/scheduler/tests/tests_scheduler.c b/drivers/gpu/drm/scheduler/tests/tests_scheduler.c
> index b66321ef7abe..d70b47d7bf7a 100644
> --- a/drivers/gpu/drm/scheduler/tests/tests_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/tests/tests_scheduler.c
> @@ -181,6 +181,7 @@ struct drm_sched_client_params {
>
> struct drm_sched_test_params {
> const char *description;
> + unsigned int num_clients;
> struct drm_sched_client_params client[2];
> };
>
> @@ -626,6 +627,189 @@ static struct kunit_suite drm_sched_scheduler_two_clients2 = {
> .test_cases = drm_sched_scheduler_two_clients_tests,
> };
>
> +
> +static const struct drm_sched_test_params drm_sched_many_cases[] = {
> + {
> + .description = "2 clients",
> + .num_clients = 2,
> + .client[0] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 4,
> + .job_us = 1000,
> + .wait_us = 0,
> + .sync = true,
> + },
> + },
> + {
> + .description = "3 clients",
> + .num_clients = 3,
> + .client[0] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 4,
> + .job_us = 1000,
> + .wait_us = 0,
> + .sync = true,
> + },
> + },
> + {
> + .description = "7 clients",
> + .num_clients = 7,
> + .client[0] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 4,
> + .job_us = 1000,
> + .wait_us = 0,
> + .sync = true,
> + },
> + },
> + {
> + .description = "13 clients",
> + .num_clients = 13,
> + .client[0] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 4,
> + .job_us = 1000,
> + .wait_us = 0,
> + .sync = true,
> + },
> + },
> + {
> + .description = "31 clients",
> + .num_clients = 31,
> + .client[0] = {
> + .priority = DRM_SCHED_PRIORITY_NORMAL,
> + .job_cnt = 2,
> + .job_us = 1000,
> + .wait_us = 0,
> + .sync = true,
> + },
> + },
> +};
> +
> +KUNIT_ARRAY_PARAM(drm_sched_scheduler_many_clients,
> + drm_sched_many_cases,
> + drm_sched_desc);
> +
> +static void drm_sched_scheduler_many_clients_test(struct kunit *test)
> +{
> + const struct drm_sched_test_params *params = test->param_value;
> + struct drm_mock_scheduler *sched = test->priv;
> + const unsigned int clients = params->num_clients;
> + unsigned int i, j, delta_total = 0, loops = 0;
> + struct test_client *client;
> + unsigned int *prev_cycle;
> + ktime_t start;
> + char *buf;
> +
> + /*
> + * Many clients with deep-ish async queues.
> + */
> +
> + buf = kunit_kmalloc(test, PAGE_SIZE, GFP_KERNEL);
> + client = kunit_kcalloc(test, clients, sizeof(*client), GFP_KERNEL);
> + prev_cycle = kunit_kcalloc(test, clients, sizeof(*prev_cycle),
> + GFP_KERNEL);
> +
> + for (i = 0; i < clients; i++)
> + client[i].entity =
> + drm_mock_sched_entity_new(test,
> + DRM_SCHED_PRIORITY_NORMAL,
> + sched);
> +
> + for (i = 0; i < clients; i++) {
> + client[i].test = test;
> + client[i].id = i;
> + client[i].params = params->client[0];
> + client[i].duration = ms_to_ktime(1000 / clients);
> + client[i].cycle_time.min_us = ~0UL;
> + client[i].latency_time.min_us = ~0UL;
> + client[i].worker =
> + kthread_create_worker(0, "%s-%u", __func__, i);
> + if (IS_ERR(client[i].worker)) {
> + for (j = 0; j < i; j++)
> + kthread_destroy_worker(client[j].worker);
> + KUNIT_FAIL(test, "Failed to create worker!\n");
> + }
> +
> + kthread_init_work(&client[i].work, drm_sched_client_work);
> + }
> +
> + for (i = 0; i < clients; i++)
> + kthread_queue_work(client[i].worker, &client[i].work);
> +
> + start = ktime_get();
> + pr_info("%u clients:\n\tt\t\tcycle:\t min avg max : ...\n", clients);
> + for (;;) {
> + unsigned int min = ~0;
> + unsigned int max = 0;
> + unsigned int total = 0;
> + bool done = true;
> + char pbuf[16];
> +
> + memset(buf, 0, PAGE_SIZE);
> + for (i = 0; i < clients; i++) {
> + unsigned int cycle, cycles;
> +
> + cycle = READ_ONCE(client[i].cycle);
> + cycles = READ_ONCE(client[i].cycles);
> +
> + snprintf(pbuf, sizeof(pbuf), " %3d", cycle);
> + strncat(buf, pbuf, PAGE_SIZE);
> +
> + total += cycle;
> + if (cycle < min)
> + min = cycle;
> + if (cycle > max)
> + max = cycle;
> +
> + if (!min || (cycle + 1) < cycles)
> + done = false;
> + }
> +
> + loops++;
> + delta_total += max - min;
> +
> + pr_info("\t+%6lldms\t\t %3u %3u %3u :%s\n",
> + ktime_to_ms(ktime_sub(ktime_get(), start)),
> + min, DIV_ROUND_UP(total, clients), max, buf);
> +
> + if (done)
> + break;
> +
> + msleep(100);
> + }
> +
> + pr_info(" avg_max_min_delta(x100)=%u\n",
> + loops ? DIV_ROUND_UP(delta_total * 100, loops) : 0);
> +
> + for (i = 0; i < clients; i++) {
> + kthread_flush_work(&client[i].work);
> + kthread_destroy_worker(client[i].worker);
> + }
> +
> + for (i = 0; i < clients; i++)
> + drm_mock_sched_entity_free(client[i].entity);
> +}
> +
> +static const struct kunit_attributes drm_sched_scheduler_many_clients_attr = {
> + .speed = KUNIT_SPEED_SLOW,
> +};
> +
> +static struct kunit_case drm_sched_scheduler_many_clients_tests[] = {
> + KUNIT_CASE_PARAM_ATTR(drm_sched_scheduler_many_clients_test,
> + drm_sched_scheduler_many_clients_gen_params,
> + drm_sched_scheduler_many_clients_attr),
> + {}
> +};
> +
> +static struct kunit_suite drm_sched_scheduler_many_clients = {
> + .name = "drm_sched_scheduler_many_clients_tests",
> + .init = drm_sched_scheduler_init2,
> + .exit = drm_sched_scheduler_exit,
> + .test_cases = drm_sched_scheduler_many_clients_tests,
> +};
> +
> kunit_test_suites(&drm_sched_scheduler_overhead,
> &drm_sched_scheduler_two_clients1,
> - &drm_sched_scheduler_two_clients2);
> + &drm_sched_scheduler_two_clients2,
> + &drm_sched_scheduler_many_clients);
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC v4 03/16] drm/sched: De-clutter drm_sched_init
2025-04-25 10:20 ` [RFC v4 03/16] drm/sched: De-clutter drm_sched_init Tvrtko Ursulin
@ 2025-04-29 15:16 ` Christian König
0 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2025-04-29 15:16 UTC (permalink / raw)
To: Tvrtko Ursulin, amd-gfx, dri-devel
Cc: kernel-dev, Danilo Krummrich, Matthew Brost, Philipp Stanner
On 4/25/25 12:20, Tvrtko Ursulin wrote:
> Move work queue allocation into a helper for a more streamlined function
> body.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 27 +++++++++++++-------------
> 1 file changed, 14 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index ca5028f7a4e9..86e40157b09b 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -83,12 +83,6 @@
> #define CREATE_TRACE_POINTS
> #include "gpu_scheduler_trace.h"
>
> -#ifdef CONFIG_LOCKDEP
> -static struct lockdep_map drm_sched_lockdep_map = {
> - .name = "drm_sched_lockdep_map"
> -};
> -#endif
> -
> int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
>
> /**
> @@ -1258,6 +1252,19 @@ static void drm_sched_run_job_work(struct work_struct *w)
> drm_sched_run_job_queue(sched);
> }
>
> +static struct workqueue_struct *drm_sched_alloc_wq(const char *name)
> +{
> +#if (IS_ENABLED(CONFIG_LOCKDEP))
> + static struct lockdep_map map = {
> + .name = "drm_sched_lockdep_map"
> + };
> +
> + return alloc_ordered_workqueue_lockdep_map(name, WQ_MEM_RECLAIM, &map);
Some comment why we have a separate lockdep map would be really nice to have here.
Apart from that looks good to me,
Christian.
> +#else
> + return alloc_ordered_workqueue(name, WQ_MEM_RECLAIM);
> +#endif
> +}
> +
> /**
> * drm_sched_init - Init a gpu scheduler instance
> *
> @@ -1298,13 +1305,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, const struct drm_sched_init_
> sched->submit_wq = args->submit_wq;
> sched->own_submit_wq = false;
> } else {
> -#ifdef CONFIG_LOCKDEP
> - sched->submit_wq = alloc_ordered_workqueue_lockdep_map(args->name,
> - WQ_MEM_RECLAIM,
> - &drm_sched_lockdep_map);
> -#else
> - sched->submit_wq = alloc_ordered_workqueue(args->name, WQ_MEM_RECLAIM);
> -#endif
> + sched->submit_wq = drm_sched_alloc_wq(args->name);
> if (!sched->submit_wq)
> return -ENOMEM;
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC v4 01/16] drm/sched: Add some scheduling quality unit tests
2025-04-29 15:03 ` Christian König
@ 2025-04-29 15:45 ` Michel Dänzer
2025-04-29 15:52 ` Christian König
0 siblings, 1 reply; 35+ messages in thread
From: Michel Dänzer @ 2025-04-29 15:45 UTC (permalink / raw)
To: Christian König, Tvrtko Ursulin, amd-gfx, dri-devel
Cc: kernel-dev, Danilo Krummrich, Matthew Brost, Philipp Stanner,
Pierre-Eric Pelloux-Prayer
On 2025-04-29 17:03, Christian König wrote:
> On 4/25/25 12:20, Tvrtko Ursulin wrote:
>> Note that the PCT and CPS metrics are irrelevant for "unsync" clients
>> since they manage to complete all of their cycles instantenuously.
>
> Typo instantanuously.
Make that "instantaneously".
--
Earthling Michel Dänzer \ GNOME / Xwayland / Mesa developer
https://redhat.com \ Libre software enthusiast
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC v4 01/16] drm/sched: Add some scheduling quality unit tests
2025-04-29 15:45 ` Michel Dänzer
@ 2025-04-29 15:52 ` Christian König
0 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2025-04-29 15:52 UTC (permalink / raw)
To: Michel Dänzer, Tvrtko Ursulin, amd-gfx, dri-devel
Cc: kernel-dev, Danilo Krummrich, Matthew Brost, Philipp Stanner,
Pierre-Eric Pelloux-Prayer
On 4/29/25 17:45, Michel Dänzer wrote:
> On 2025-04-29 17:03, Christian König wrote:
>> On 4/25/25 12:20, Tvrtko Ursulin wrote:
>>> Note that the PCT and CPS metrics are irrelevant for "unsync" clients
>>> since they manage to complete all of their cycles instantenuously.
>>
>> Typo instantanuously.
>
> Make that "instantaneously".
So much for the quality of autocorrect :)
Cheers,
Christian.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC v4 04/16] drm/sched: Avoid double re-lock on the job free path
2025-04-25 10:20 ` [RFC v4 04/16] drm/sched: Avoid double re-lock on the job free path Tvrtko Ursulin
@ 2025-05-12 12:49 ` Philipp Stanner
2025-05-12 12:57 ` Matthew Brost
2025-05-14 8:46 ` Tvrtko Ursulin
0 siblings, 2 replies; 35+ messages in thread
From: Philipp Stanner @ 2025-05-12 12:49 UTC (permalink / raw)
To: Tvrtko Ursulin, amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Matthew Brost,
Philipp Stanner
On Fri, 2025-04-25 at 11:20 +0100, Tvrtko Ursulin wrote:
> Currently the job free work item will lock sched->job_list_lock first
> time
> to see if there are any jobs, free a single job, and then lock again
> to
> decide whether to re-queue itself if there are more finished jobs.
>
> Since drm_sched_get_finished_job() already looks at the second job in
> the
> queue we can simply add the signaled check and have it return the
> presence
> of more jobs to free to the caller. That way the work item does not
> have
> to lock the list again and repeat the signaled check.
Are you convinced that this is worth it?
I'm torn. It's rare that one returns a status through a boolean by
reference.
Independently from that, this is a candidate which certainly can be
branched out from this series, to make the series completely about the
new scheduling policy, not general other improvements.
P.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 39 +++++++++++-------------
> --
> 1 file changed, 16 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 86e40157b09b..a45b02fd2af3 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -365,22 +365,6 @@ static void __drm_sched_run_free_queue(struct
> drm_gpu_scheduler *sched)
> queue_work(sched->submit_wq, &sched->work_free_job);
> }
>
> -/**
> - * drm_sched_run_free_queue - enqueue free-job work if ready
> - * @sched: scheduler instance
> - */
> -static void drm_sched_run_free_queue(struct drm_gpu_scheduler
> *sched)
> -{
> - struct drm_sched_job *job;
> -
> - spin_lock(&sched->job_list_lock);
> - job = list_first_entry_or_null(&sched->pending_list,
> - struct drm_sched_job, list);
> - if (job && dma_fence_is_signaled(&job->s_fence->finished))
> - __drm_sched_run_free_queue(sched);
> - spin_unlock(&sched->job_list_lock);
> -}
> -
> /**
> * drm_sched_job_done - complete a job
> * @s_job: pointer to the job which is done
> @@ -1097,12 +1081,13 @@ drm_sched_select_entity(struct
> drm_gpu_scheduler *sched)
> * drm_sched_get_finished_job - fetch the next finished job to be
> destroyed
> *
> * @sched: scheduler instance
> + * @have_more: are there more finished jobs on the list
> *
> * Returns the next finished job from the pending list (if there is
> one)
> * ready for it to be destroyed.
> */
> static struct drm_sched_job *
> -drm_sched_get_finished_job(struct drm_gpu_scheduler *sched)
> +drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool
> *have_more)
> {
> struct drm_sched_job *job, *next;
>
> @@ -1110,22 +1095,27 @@ drm_sched_get_finished_job(struct
> drm_gpu_scheduler *sched)
>
> job = list_first_entry_or_null(&sched->pending_list,
> struct drm_sched_job, list);
> -
> if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
> /* remove job from pending_list */
> list_del_init(&job->list);
>
> /* cancel this job's TO timer */
> cancel_delayed_work(&sched->work_tdr);
> - /* make the scheduled timestamp more accurate */
> +
> + *have_more = false;
> next = list_first_entry_or_null(&sched-
> >pending_list,
> typeof(*next),
> list);
> -
> if (next) {
> + /* make the scheduled timestamp more
> accurate */
> if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
> &next->s_fence-
> >scheduled.flags))
> next->s_fence->scheduled.timestamp =
> dma_fence_timestamp(&job-
> >s_fence->finished);
> +
> + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> + &next->s_fence-
> >finished.flags))
> + *have_more = true;
> +
> /* start TO timer for next job */
> drm_sched_start_timeout(sched);
> }
> @@ -1184,12 +1174,15 @@ static void drm_sched_free_job_work(struct
> work_struct *w)
> struct drm_gpu_scheduler *sched =
> container_of(w, struct drm_gpu_scheduler,
> work_free_job);
> struct drm_sched_job *job;
> + bool have_more;
>
> - job = drm_sched_get_finished_job(sched);
> - if (job)
> + job = drm_sched_get_finished_job(sched, &have_more);
> + if (job) {
> sched->ops->free_job(job);
> + if (have_more)
> + __drm_sched_run_free_queue(sched);
> + }
>
> - drm_sched_run_free_queue(sched);
> drm_sched_run_job_queue(sched);
> }
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC v4 05/16] drm/sched: Consolidate drm_sched_job_timedout
2025-04-25 10:20 ` [RFC v4 05/16] drm/sched: Consolidate drm_sched_job_timedout Tvrtko Ursulin
@ 2025-05-12 12:53 ` Philipp Stanner
2025-05-14 8:57 ` Tvrtko Ursulin
0 siblings, 1 reply; 35+ messages in thread
From: Philipp Stanner @ 2025-05-12 12:53 UTC (permalink / raw)
To: Tvrtko Ursulin, amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Matthew Brost,
Philipp Stanner
On Fri, 2025-04-25 at 11:20 +0100, Tvrtko Ursulin wrote:
> Reduce to one spin_unlock for hopefully a little bit clearer flow in
> the
> function. It may appear that there is a behavioural change with the
> drm_sched_start_timeout_unlocked() now not being called if there were
> initially no jobs on the pending list, and then some appeared after
> unlock, however if the code would rely on the TDR handler restarting
> itself then it would fail to do that if the job arrived on the
> pending
> list after the check.
>
> Also fix one stale comment while touching the function.
Same here, that's a good candidate for a separate patch / series.
P.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 37 +++++++++++++-----------
> --
> 1 file changed, 18 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index a45b02fd2af3..a26cc11c8ade 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -516,38 +516,37 @@ static void drm_sched_job_begin(struct
> drm_sched_job *s_job)
>
> static void drm_sched_job_timedout(struct work_struct *work)
> {
> - struct drm_gpu_scheduler *sched;
> + struct drm_gpu_scheduler *sched =
> + container_of(work, struct drm_gpu_scheduler,
> work_tdr.work);
> + enum drm_gpu_sched_stat status;
> struct drm_sched_job *job;
> - enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL;
> -
> - sched = container_of(work, struct drm_gpu_scheduler,
> work_tdr.work);
>
> /* Protects against concurrent deletion in
> drm_sched_get_finished_job */
> spin_lock(&sched->job_list_lock);
> job = list_first_entry_or_null(&sched->pending_list,
> struct drm_sched_job, list);
> -
> if (job) {
> /*
> * Remove the bad job so it cannot be freed by
> concurrent
> - * drm_sched_cleanup_jobs. It will be reinserted
> back after sched->thread
> - * is parked at which point it's safe.
> + * drm_sched_get_finished_job. It will be reinserted
> back after
> + * scheduler worker is stopped at which point it's
> safe.
> */
> list_del_init(&job->list);
> - spin_unlock(&sched->job_list_lock);
> + }
> + spin_unlock(&sched->job_list_lock);
>
> - status = job->sched->ops->timedout_job(job);
> + if (!job)
> + return;
>
> - /*
> - * Guilty job did complete and hence needs to be
> manually removed
> - * See drm_sched_stop doc.
> - */
> - if (sched->free_guilty) {
> - job->sched->ops->free_job(job);
> - sched->free_guilty = false;
> - }
> - } else {
> - spin_unlock(&sched->job_list_lock);
> + status = job->sched->ops->timedout_job(job);
> +
> + /*
> + * Guilty job did complete and hence needs to be manually
> removed. See
> + * documentation for drm_sched_stop.
> + */
> + if (sched->free_guilty) {
> + job->sched->ops->free_job(job);
> + sched->free_guilty = false;
> }
>
> if (status != DRM_GPU_SCHED_STAT_ENODEV)
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC v4 10/16] drm/sched: Free all finished jobs at once
2025-04-25 10:20 ` [RFC v4 10/16] drm/sched: Free all finished jobs at once Tvrtko Ursulin
@ 2025-05-12 12:56 ` Philipp Stanner
2025-05-14 9:00 ` Tvrtko Ursulin
0 siblings, 1 reply; 35+ messages in thread
From: Philipp Stanner @ 2025-05-12 12:56 UTC (permalink / raw)
To: Tvrtko Ursulin, amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Matthew Brost,
Philipp Stanner
On Fri, 2025-04-25 at 11:20 +0100, Tvrtko Ursulin wrote:
> To implement fair scheduling we will need as accurate as possible
> view
> into per entity GPU time utilisation. Because sched fence execution
> time
> are only adjusted for accuracy in the free worker we need to process
> completed jobs as soon as possible so the metric is most up to date
> when
> view from the submission side of things.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 15 ++-------------
> 1 file changed, 2 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 8950c7705f57..22428a1569dd 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -865,13 +865,12 @@ drm_sched_select_entity(struct
> drm_gpu_scheduler *sched)
> * drm_sched_get_finished_job - fetch the next finished job to be
> destroyed
> *
> * @sched: scheduler instance
> - * @have_more: are there more finished jobs on the list
> *
> * Returns the next finished job from the pending list (if there is
> one)
> * ready for it to be destroyed.
> */
> static struct drm_sched_job *
> -drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool
> *have_more)
> +drm_sched_get_finished_job(struct drm_gpu_scheduler *sched)
> {
> struct drm_sched_job *job, *next;
>
> @@ -886,7 +885,6 @@ drm_sched_get_finished_job(struct
> drm_gpu_scheduler *sched, bool *have_more)
> /* cancel this job's TO timer */
> cancel_delayed_work(&sched->work_tdr);
>
> - *have_more = false;
> next = list_first_entry_or_null(&sched-
> >pending_list,
> typeof(*next),
> list);
> if (next) {
> @@ -896,10 +894,6 @@ drm_sched_get_finished_job(struct
> drm_gpu_scheduler *sched, bool *have_more)
> next->s_fence->scheduled.timestamp =
> dma_fence_timestamp(&job-
> >s_fence->finished);
>
> - if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> - &next->s_fence-
> >finished.flags))
> - *have_more = true;
> -
> /* start TO timer for next job */
> drm_sched_start_timeout(sched);
> }
> @@ -958,14 +952,9 @@ static void drm_sched_free_job_work(struct
> work_struct *w)
> struct drm_gpu_scheduler *sched =
> container_of(w, struct drm_gpu_scheduler,
> work_free_job);
> struct drm_sched_job *job;
> - bool have_more;
>
> - job = drm_sched_get_finished_job(sched, &have_more);
> - if (job) {
> + while ((job = drm_sched_get_finished_job(sched)))
> sched->ops->free_job(job);
> - if (have_more)
> - __drm_sched_run_free_queue(sched);
> - }
Are there any have_more users left after that?
Removing here what was added before IMO makes it more questionable
adding that improvement in the first place.
P.
>
> drm_sched_run_job_queue(sched);
> }
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC v4 04/16] drm/sched: Avoid double re-lock on the job free path
2025-05-12 12:49 ` Philipp Stanner
@ 2025-05-12 12:57 ` Matthew Brost
2025-05-14 8:54 ` Tvrtko Ursulin
2025-05-14 8:46 ` Tvrtko Ursulin
1 sibling, 1 reply; 35+ messages in thread
From: Matthew Brost @ 2025-05-12 12:57 UTC (permalink / raw)
To: phasta
Cc: Tvrtko Ursulin, amd-gfx, dri-devel, kernel-dev,
Christian König, Danilo Krummrich
On Mon, May 12, 2025 at 02:49:55PM +0200, Philipp Stanner wrote:
> On Fri, 2025-04-25 at 11:20 +0100, Tvrtko Ursulin wrote:
> > Currently the job free work item will lock sched->job_list_lock first
> > time
> > to see if there are any jobs, free a single job, and then lock again
> > to
> > decide whether to re-queue itself if there are more finished jobs.
> >
> > Since drm_sched_get_finished_job() already looks at the second job in
> > the
> > queue we can simply add the signaled check and have it return the
> > presence
> > of more jobs to free to the caller. That way the work item does not
> > have
> > to lock the list again and repeat the signaled check.
>
> Are you convinced that this is worth it?
>
> I'm torn. It's rare that one returns a status through a boolean by
> reference.
>
I'd say no to this (mirco optimization) and to freeing / running more
than job per worker invocation. The later was rejected in original work
queue conversion.
Matt
>
> Independently from that, this is a candidate which certainly can be
> branched out from this series, to make the series completely about the
> new scheduling policy, not general other improvements.
>
>
> P.
>
> >
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> > Cc: Christian König <christian.koenig@amd.com>
> > Cc: Danilo Krummrich <dakr@kernel.org>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: Philipp Stanner <phasta@kernel.org>
> > ---
> > drivers/gpu/drm/scheduler/sched_main.c | 39 +++++++++++-------------
> > --
> > 1 file changed, 16 insertions(+), 23 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index 86e40157b09b..a45b02fd2af3 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -365,22 +365,6 @@ static void __drm_sched_run_free_queue(struct
> > drm_gpu_scheduler *sched)
> > queue_work(sched->submit_wq, &sched->work_free_job);
> > }
> >
> > -/**
> > - * drm_sched_run_free_queue - enqueue free-job work if ready
> > - * @sched: scheduler instance
> > - */
> > -static void drm_sched_run_free_queue(struct drm_gpu_scheduler
> > *sched)
> > -{
> > - struct drm_sched_job *job;
> > -
> > - spin_lock(&sched->job_list_lock);
> > - job = list_first_entry_or_null(&sched->pending_list,
> > - struct drm_sched_job, list);
> > - if (job && dma_fence_is_signaled(&job->s_fence->finished))
> > - __drm_sched_run_free_queue(sched);
> > - spin_unlock(&sched->job_list_lock);
> > -}
> > -
> > /**
> > * drm_sched_job_done - complete a job
> > * @s_job: pointer to the job which is done
> > @@ -1097,12 +1081,13 @@ drm_sched_select_entity(struct
> > drm_gpu_scheduler *sched)
> > * drm_sched_get_finished_job - fetch the next finished job to be
> > destroyed
> > *
> > * @sched: scheduler instance
> > + * @have_more: are there more finished jobs on the list
> > *
> > * Returns the next finished job from the pending list (if there is
> > one)
> > * ready for it to be destroyed.
> > */
> > static struct drm_sched_job *
> > -drm_sched_get_finished_job(struct drm_gpu_scheduler *sched)
> > +drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool
> > *have_more)
> > {
> > struct drm_sched_job *job, *next;
> >
> > @@ -1110,22 +1095,27 @@ drm_sched_get_finished_job(struct
> > drm_gpu_scheduler *sched)
> >
> > job = list_first_entry_or_null(&sched->pending_list,
> > struct drm_sched_job, list);
> > -
> > if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
> > /* remove job from pending_list */
> > list_del_init(&job->list);
> >
> > /* cancel this job's TO timer */
> > cancel_delayed_work(&sched->work_tdr);
> > - /* make the scheduled timestamp more accurate */
> > +
> > + *have_more = false;
> > next = list_first_entry_or_null(&sched-
> > >pending_list,
> > typeof(*next),
> > list);
> > -
> > if (next) {
> > + /* make the scheduled timestamp more
> > accurate */
> > if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
> > &next->s_fence-
> > >scheduled.flags))
> > next->s_fence->scheduled.timestamp =
> > dma_fence_timestamp(&job-
> > >s_fence->finished);
> > +
> > + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> > + &next->s_fence-
> > >finished.flags))
> > + *have_more = true;
> > +
> > /* start TO timer for next job */
> > drm_sched_start_timeout(sched);
> > }
> > @@ -1184,12 +1174,15 @@ static void drm_sched_free_job_work(struct
> > work_struct *w)
> > struct drm_gpu_scheduler *sched =
> > container_of(w, struct drm_gpu_scheduler,
> > work_free_job);
> > struct drm_sched_job *job;
> > + bool have_more;
> >
> > - job = drm_sched_get_finished_job(sched);
> > - if (job)
> > + job = drm_sched_get_finished_job(sched, &have_more);
> > + if (job) {
> > sched->ops->free_job(job);
> > + if (have_more)
> > + __drm_sched_run_free_queue(sched);
> > + }
> >
> > - drm_sched_run_free_queue(sched);
> > drm_sched_run_job_queue(sched);
> > }
> >
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC v4 12/16] drm/sched: Remove idle entity from tree
2025-04-25 10:20 ` [RFC v4 12/16] drm/sched: Remove idle entity from tree Tvrtko Ursulin
@ 2025-05-12 13:03 ` Philipp Stanner
2025-05-14 9:22 ` Tvrtko Ursulin
0 siblings, 1 reply; 35+ messages in thread
From: Philipp Stanner @ 2025-05-12 13:03 UTC (permalink / raw)
To: Tvrtko Ursulin, amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Matthew Brost,
Philipp Stanner
On Fri, 2025-04-25 at 11:20 +0100, Tvrtko Ursulin wrote:
> There is no need to keep entities with no jobs in the tree so lets
> remove
> it once the last job is consumed. This keeps the tree smaller which
> is
> nicer and more efficient as entities are removed and re-added on
> every
> popped job.
That there is no need to do so doesn't imply that you can't keep them
around. The commit message doesn't make the motivation clear
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> ---
> drivers/gpu/drm/scheduler/sched_rq.c | 24 +++++++++++++-----------
> 1 file changed, 13 insertions(+), 11 deletions(-)
Since this doesn't simplify the code base, I think the only
justification would be a somewhat decent performance gain. Does this
patch result in that?
Otherwise it's probably better to keep git-blame intact here.
P.
>
> diff --git a/drivers/gpu/drm/scheduler/sched_rq.c
> b/drivers/gpu/drm/scheduler/sched_rq.c
> index d477a027feb9..2cde89cf25fb 100644
> --- a/drivers/gpu/drm/scheduler/sched_rq.c
> +++ b/drivers/gpu/drm/scheduler/sched_rq.c
> @@ -149,25 +149,27 @@ void drm_sched_rq_pop_entity(struct
> drm_sched_entity *entity)
> {
> struct drm_sched_job *next_job;
> struct drm_sched_rq *rq;
> - ktime_t ts;
>
> /*
> * Update the entity's location in the min heap according to
> * the timestamp of the next job, if any.
> */
> + spin_lock(&entity->lock);
> + rq = entity->rq;
> + spin_lock(&rq->lock);
> next_job = drm_sched_entity_queue_peek(entity);
> - if (!next_job)
> - return;
> + if (next_job) {
> + ktime_t ts;
>
> - if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> - ts = next_job->submit_ts;
> - else
> - ts = drm_sched_rq_get_rr_deadline(rq);
> + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> + ts = next_job->submit_ts;
> + else
> + ts = drm_sched_rq_get_rr_deadline(rq);
>
> - spin_lock(&entity->lock);
> - rq = entity->rq;
> - spin_lock(&rq->lock);
> - drm_sched_rq_update_fifo_locked(entity, rq, ts);
> + drm_sched_rq_update_fifo_locked(entity, rq, ts);
> + } else {
> + drm_sched_rq_remove_fifo_locked(entity, rq);
> + }
> spin_unlock(&rq->lock);
> spin_unlock(&entity->lock);
> }
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC v4 16/16] drm/sched: Embed run queue singleton into the scheduler
2025-04-25 10:20 ` [RFC v4 16/16] drm/sched: Embed run queue singleton into the scheduler Tvrtko Ursulin
@ 2025-05-12 13:05 ` Philipp Stanner
0 siblings, 0 replies; 35+ messages in thread
From: Philipp Stanner @ 2025-05-12 13:05 UTC (permalink / raw)
To: Tvrtko Ursulin, amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Matthew Brost,
Philipp Stanner
On Fri, 2025-04-25 at 11:20 +0100, Tvrtko Ursulin wrote:
> Now that the run queue to scheduler relationship is always 1:1 we can
> embed it (the run queue) directly in the scheduler struct and save on
> some allocation error handling code and such.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 ++--
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 6 ++--
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 5 +++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 8 ++++--
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 8 +++---
> drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 8 +++---
> drivers/gpu/drm/scheduler/sched_entity.c | 32 +++++++++----------
> --
> drivers/gpu/drm/scheduler/sched_fence.c | 2 +-
> drivers/gpu/drm/scheduler/sched_internal.h | 6 ++--
> drivers/gpu/drm/scheduler/sched_main.c | 31 ++++---------------
> -
> drivers/gpu/drm/scheduler/sched_rq.c | 18 ++++++------
> include/drm/gpu_scheduler.h | 5 +---
> 12 files changed, 58 insertions(+), 77 deletions(-)
That's looking great :)
Looking forward to us moving there
P.
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 82df06a72ee0..e18e180bf32c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1108,7 +1108,8 @@ static int amdgpu_cs_vm_handling(struct
> amdgpu_cs_parser *p)
> if (p->gang_size > 1 && !adev->vm_manager.concurrent_flush)
> {
> for (i = 0; i < p->gang_size; ++i) {
> struct drm_sched_entity *entity = p-
> >entities[i];
> - struct drm_gpu_scheduler *sched = entity-
> >rq->sched;
> + struct drm_gpu_scheduler *sched =
> + container_of(entity->rq,
> typeof(*sched), rq);
> struct amdgpu_ring *ring =
> to_amdgpu_ring(sched);
>
> if (amdgpu_vmid_uses_reserved(vm, ring-
> >vm_hub))
> @@ -1236,7 +1237,8 @@ static int amdgpu_cs_sync_rings(struct
> amdgpu_cs_parser *p)
> return r;
> }
>
> - sched = p->gang_leader->base.entity->rq->sched;
> + sched = container_of(p->gang_leader->base.entity->rq,
> typeof(*sched),
> + rq);
> while ((fence = amdgpu_sync_get_fence(&p->sync))) {
> struct drm_sched_fence *s_fence =
> to_drm_sched_fence(fence);
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index 9440af58073b..e3d4f7503738 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -359,7 +359,9 @@ static struct dma_fence *
> amdgpu_job_prepare_job(struct drm_sched_job *sched_job,
> struct drm_sched_entity *s_entity)
> {
> - struct amdgpu_ring *ring = to_amdgpu_ring(s_entity->rq-
> >sched);
> + struct drm_gpu_scheduler *sched =
> + container_of(s_entity->rq, typeof(*sched), rq);
> + struct amdgpu_ring *ring = to_amdgpu_ring(sched);
> struct amdgpu_job *job = to_amdgpu_job(sched_job);
> struct dma_fence *fence;
> int r;
> @@ -459,7 +461,7 @@ drm_sched_entity_queue_pop(struct
> drm_sched_entity *entity)
>
> void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler
> *sched)
> {
> - struct drm_sched_rq *rq = sched->rq;
> + struct drm_sched_rq *rq = &sched->rq;
> struct drm_sched_entity *s_entity;
> struct drm_sched_job *s_job;
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> index ce6b9ba967ff..d6872baeba1e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> @@ -85,7 +85,10 @@ struct amdgpu_job {
>
> static inline struct amdgpu_ring *amdgpu_job_ring(struct amdgpu_job
> *job)
> {
> - return to_amdgpu_ring(job->base.entity->rq->sched);
> + struct drm_gpu_scheduler *sched =
> + container_of(job->base.entity->rq, typeof(*sched),
> rq);
> +
> + return to_amdgpu_ring(sched);
> }
>
> int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm
> *vm,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> index 11dd2e0f7979..197d20a37afb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> @@ -145,6 +145,7 @@ TRACE_EVENT(amdgpu_cs,
> struct amdgpu_ib *ib),
> TP_ARGS(p, job, ib),
> TP_STRUCT__entry(
> + __field(struct drm_gpu_scheduler *,
> sched)
> __field(struct amdgpu_bo_list *,
> bo_list)
> __field(u32, ring)
> __field(u32, dw)
> @@ -152,11 +153,14 @@ TRACE_EVENT(amdgpu_cs,
> ),
>
> TP_fast_assign(
> + __entry->sched = container_of(job-
> >base.entity->rq,
> +
> typeof(*__entry->sched),
> + rq);
> __entry->bo_list = p->bo_list;
> - __entry->ring = to_amdgpu_ring(job-
> >base.entity->rq->sched)->idx;
> + __entry->ring = to_amdgpu_ring(__entry-
> >sched)->idx;
> __entry->dw = ib->length_dw;
> __entry->fences =
> amdgpu_fence_count_emitted(
> - to_amdgpu_ring(job->base.entity->rq-
> >sched));
> + to_amdgpu_ring(__entry->sched));
> ),
> TP_printk("bo_list=%p, ring=%u, dw=%u, fences=%u",
> __entry->bo_list, __entry->ring, __entry->dw,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> index 46d9fb433ab2..42f2bfb30af1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> @@ -105,13 +105,13 @@ static int amdgpu_vm_sdma_prepare(struct
> amdgpu_vm_update_params *p,
> static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
> struct dma_fence **fence)
> {
> + struct drm_gpu_scheduler *sched =
> + container_of(p->vm->delayed.rq, typeof(*sched), rq);
> + struct amdgpu_ring *ring =
> + container_of(sched, struct amdgpu_ring, sched);
> struct amdgpu_ib *ib = p->job->ibs;
> - struct amdgpu_ring *ring;
> struct dma_fence *f;
>
> - ring = container_of(p->vm->delayed.rq->sched, struct
> amdgpu_ring,
> - sched);
> -
> WARN_ON(ib->length_dw == 0);
> amdgpu_ring_pad_ib(ring, ib);
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
> index 23b6f7a4aa4a..ab132dae8183 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
> @@ -420,15 +420,15 @@ int amdgpu_xcp_open_device(struct amdgpu_device
> *adev,
> void amdgpu_xcp_release_sched(struct amdgpu_device *adev,
> struct amdgpu_ctx_entity *entity)
> {
> - struct drm_gpu_scheduler *sched;
> - struct amdgpu_ring *ring;
> + struct drm_gpu_scheduler *sched =
> + container_of(entity->entity.rq, typeof(*sched), rq);
>
> if (!adev->xcp_mgr)
> return;
>
> - sched = entity->entity.rq->sched;
> if (drm_sched_wqueue_ready(sched)) {
> - ring = to_amdgpu_ring(entity->entity.rq->sched);
> + struct amdgpu_ring *ring = to_amdgpu_ring(sched);
> +
> atomic_dec(&adev->xcp_mgr->xcp[ring-
> >xcp_id].ref_cnt);
> }
> }
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index d149df2a2050..bc890f735552 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -104,19 +104,12 @@ int drm_sched_entity_init(struct
> drm_sched_entity *entity,
> * is initialized itself.
> */
> entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
> + if (num_sched_list) {
> + entity->sched_list = num_sched_list > 1 ? sched_list
> : NULL;
> + entity->rq = &sched_list[0]->rq;
> + }
> RCU_INIT_POINTER(entity->last_scheduled, NULL);
> RB_CLEAR_NODE(&entity->rb_tree_node);
> -
> - if (num_sched_list && !sched_list[0]->rq) {
> - /* Since every entry covered by num_sched_list
> - * should be non-NULL and therefore we warn drivers
> - * not to do this and to fix their DRM calling
> order.
> - */
> - pr_warn("%s: called with uninitialized scheduler\n",
> __func__);
> - } else if (num_sched_list) {
> - entity->rq = sched_list[0]->rq;
> - }
> -
> init_completion(&entity->entity_idle);
>
> /* We start in an idle state. */
> @@ -303,7 +296,7 @@ long drm_sched_entity_flush(struct
> drm_sched_entity *entity, long timeout)
> if (!entity->rq)
> return 0;
>
> - sched = entity->rq->sched;
> + sched = container_of(entity->rq, typeof(*sched), rq);
> /**
> * The client will not queue more IBs during this fini,
> consume existing
> * queued IBs or discard them on SIGKILL
> @@ -395,9 +388,11 @@ static void drm_sched_entity_wakeup(struct
> dma_fence *f,
> {
> struct drm_sched_entity *entity =
> container_of(cb, struct drm_sched_entity, cb);
> + struct drm_gpu_scheduler *sched =
> + container_of(entity->rq, typeof(*sched), rq);
>
> drm_sched_entity_clear_dep(f, cb);
> - drm_sched_wakeup(entity->rq->sched);
> + drm_sched_wakeup(sched);
> }
>
> /**
> @@ -423,7 +418,8 @@ EXPORT_SYMBOL(drm_sched_entity_set_priority);
> */
> static bool drm_sched_entity_add_dependency_cb(struct
> drm_sched_entity *entity)
> {
> - struct drm_gpu_scheduler *sched = entity->rq->sched;
> + struct drm_gpu_scheduler *sched =
> + container_of(entity->rq, typeof(*sched), rq);
> struct dma_fence *fence = entity->dependency;
> struct drm_sched_fence *s_fence;
>
> @@ -562,7 +558,7 @@ void drm_sched_entity_select_rq(struct
> drm_sched_entity *entity)
>
> spin_lock(&entity->lock);
> sched = drm_sched_pick_best(entity->sched_list, entity-
> >num_sched_list);
> - rq = sched ? sched->rq : NULL;
> + rq = sched ? &sched->rq : NULL;
> if (rq != entity->rq) {
> drm_sched_rq_remove_entity(entity->rq, entity);
> entity->rq = rq;
> @@ -585,10 +581,12 @@ void drm_sched_entity_select_rq(struct
> drm_sched_entity *entity)
> void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
> {
> struct drm_sched_entity *entity = sched_job->entity;
> + struct drm_gpu_scheduler *sched =
> + container_of(entity->rq, typeof(*sched), rq);
> bool first;
>
> trace_drm_sched_job(sched_job, entity);
> - atomic_inc(entity->rq->sched->score);
> + atomic_inc(sched->score);
> WRITE_ONCE(entity->last_user, current->group_leader);
>
> /*
> @@ -599,8 +597,6 @@ void drm_sched_entity_push_job(struct
> drm_sched_job *sched_job)
>
> /* first job wakes up scheduler */
> if (first) {
> - struct drm_gpu_scheduler *sched;
> -
> sched = drm_sched_rq_add_entity(entity);
> if (sched)
> drm_sched_wakeup(sched);
> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c
> b/drivers/gpu/drm/scheduler/sched_fence.c
> index e971528504a5..bb48e690862d 100644
> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> @@ -225,7 +225,7 @@ void drm_sched_fence_init(struct drm_sched_fence
> *fence,
> {
> unsigned seq;
>
> - fence->sched = entity->rq->sched;
> + fence->sched = container_of(entity->rq, typeof(*fence-
> >sched), rq);
> seq = atomic_inc_return(&entity->fence_seq);
> dma_fence_init(&fence->scheduled,
> &drm_sched_fence_ops_scheduled,
> &fence->lock, entity->fence_context, seq);
> diff --git a/drivers/gpu/drm/scheduler/sched_internal.h
> b/drivers/gpu/drm/scheduler/sched_internal.h
> index c1f523bc9379..df8684689962 100644
> --- a/drivers/gpu/drm/scheduler/sched_internal.h
> +++ b/drivers/gpu/drm/scheduler/sched_internal.h
> @@ -17,11 +17,9 @@ struct drm_sched_entity_stats {
>
> void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>
> -void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
> - struct drm_sched_rq *rq);
> +void drm_sched_rq_init(struct drm_gpu_scheduler *sched);
> struct drm_sched_entity *
> -drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
> - struct drm_sched_rq *rq);
> +drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched);
> struct drm_gpu_scheduler *
> drm_sched_rq_add_entity(struct drm_sched_entity *entity);
> void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 44222cfe4dc0..d2a2202dac3a 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -572,7 +572,7 @@ void drm_sched_job_arm(struct drm_sched_job *job)
>
> BUG_ON(!entity);
> drm_sched_entity_select_rq(entity);
> - sched = entity->rq->sched;
> + sched = container_of(entity->rq, typeof(*sched), rq);
>
> job->sched = sched;
> job->s_priority = entity->priority;
> @@ -914,7 +914,7 @@ static void drm_sched_run_job_work(struct
> work_struct *w)
>
> while (!READ_ONCE(sched->pause_submit)) {
> /* Find entity with a ready job */
> - entity = drm_sched_rq_select_entity(sched, sched-
> >rq);
> + entity = drm_sched_rq_select_entity(sched);
> if (!entity)
> break; /* No more work */
>
> @@ -1006,15 +1006,6 @@ int drm_sched_init(struct drm_gpu_scheduler
> *sched, const struct drm_sched_init_
> sched->score = args->score ? args->score : &sched->_score;
> sched->dev = args->dev;
>
> - if (sched->rq) {
> - /* Not an error, but warn anyway so drivers can
> - * fine-tune their DRM calling order, and return all
> - * is good.
> - */
> - dev_warn(sched->dev, "%s: scheduler already
> initialized!\n", __func__);
> - return 0;
> - }
> -
> if (args->submit_wq) {
> sched->submit_wq = args->submit_wq;
> sched->own_submit_wq = false;
> @@ -1026,11 +1017,7 @@ int drm_sched_init(struct drm_gpu_scheduler
> *sched, const struct drm_sched_init_
> sched->own_submit_wq = true;
> }
>
> - sched->rq = kmalloc(sizeof(*sched->rq), GFP_KERNEL |
> __GFP_ZERO);
> - if (!sched->rq)
> - goto Out_check_own;
> -
> - drm_sched_rq_init(sched, sched->rq);
> + drm_sched_rq_init(sched);
>
> init_waitqueue_head(&sched->job_scheduled);
> INIT_LIST_HEAD(&sched->pending_list);
> @@ -1045,12 +1032,6 @@ int drm_sched_init(struct drm_gpu_scheduler
> *sched, const struct drm_sched_init_
>
> sched->ready = true;
> return 0;
> -
> -Out_check_own:
> - if (sched->own_submit_wq)
> - destroy_workqueue(sched->submit_wq);
> - dev_err(sched->dev, "%s: Failed to setup GPU scheduler--out
> of memory\n", __func__);
> - return -ENOMEM;
> }
> EXPORT_SYMBOL(drm_sched_init);
>
> @@ -1078,7 +1059,7 @@ EXPORT_SYMBOL(drm_sched_init);
> void drm_sched_fini(struct drm_gpu_scheduler *sched)
> {
>
> - struct drm_sched_rq *rq = sched->rq;
> + struct drm_sched_rq *rq = &sched->rq;
> struct drm_sched_entity *s_entity;
>
> drm_sched_wqueue_stop(sched);
> @@ -1102,8 +1083,6 @@ void drm_sched_fini(struct drm_gpu_scheduler
> *sched)
> if (sched->own_submit_wq)
> destroy_workqueue(sched->submit_wq);
> sched->ready = false;
> - kfree(sched->rq);
> - sched->rq = NULL;
> }
> EXPORT_SYMBOL(drm_sched_fini);
>
> @@ -1120,7 +1099,7 @@ void drm_sched_increase_karma(struct
> drm_sched_job *bad)
> {
> struct drm_gpu_scheduler *sched = bad->sched;
> struct drm_sched_entity *entity, *tmp;
> - struct drm_sched_rq *rq = sched->rq;
> + struct drm_sched_rq *rq = &sched->rq;
>
> /* don't change @bad's karma if it's from KERNEL RQ,
> * because sometimes GPU hang would cause kernel jobs (like
> VM updating jobs)
> diff --git a/drivers/gpu/drm/scheduler/sched_rq.c
> b/drivers/gpu/drm/scheduler/sched_rq.c
> index b18265c7f073..f2f10f7d6ddf 100644
> --- a/drivers/gpu/drm/scheduler/sched_rq.c
> +++ b/drivers/gpu/drm/scheduler/sched_rq.c
> @@ -52,17 +52,16 @@ static void
> drm_sched_rq_update_tree_locked(struct drm_sched_entity *entity,
> * drm_sched_rq_init - initialize a given run queue struct
> *
> * @sched: scheduler instance to associate with this run queue
> - * @rq: scheduler run queue
> *
> * Initializes a scheduler runqueue.
> */
> -void drm_sched_rq_init(struct drm_gpu_scheduler *sched,
> - struct drm_sched_rq *rq)
> +void drm_sched_rq_init(struct drm_gpu_scheduler *sched)
> {
> + struct drm_sched_rq *rq = &sched->rq;
> +
> spin_lock_init(&rq->lock);
> INIT_LIST_HEAD(&rq->entities);
> rq->rb_tree_root = RB_ROOT_CACHED;
> - rq->sched = sched;
> }
>
> static ktime_t
> @@ -109,8 +108,8 @@ drm_sched_rq_add_entity(struct drm_sched_entity
> *entity)
> }
>
> rq = entity->rq;
> + sched = container_of(rq, typeof(*sched), rq);
> spin_lock(&rq->lock);
> - sched = rq->sched;
>
> if (list_empty(&entity->list)) {
> atomic_inc(sched->score);
> @@ -138,6 +137,8 @@ drm_sched_rq_add_entity(struct drm_sched_entity
> *entity)
> void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
> struct drm_sched_entity *entity)
> {
> + struct drm_gpu_scheduler *sched = container_of(rq,
> typeof(*sched), rq);
> +
> lockdep_assert_held(&entity->lock);
>
> if (list_empty(&entity->list))
> @@ -145,7 +146,7 @@ void drm_sched_rq_remove_entity(struct
> drm_sched_rq *rq,
>
> spin_lock(&rq->lock);
>
> - atomic_dec(rq->sched->score);
> + atomic_dec(sched->score);
> list_del_init(&entity->list);
>
> drm_sched_rq_remove_tree_locked(entity, rq);
> @@ -186,16 +187,15 @@ void drm_sched_rq_pop_entity(struct
> drm_sched_entity *entity)
> * drm_sched_rq_select_entity - Select an entity which provides a
> job to run
> *
> * @sched: the gpu scheduler
> - * @rq: scheduler run queue to check.
> *
> * Find oldest waiting ready entity.
> *
> * Return an entity if one is found or NULL if no ready entity was
> found.
> */
> struct drm_sched_entity *
> -drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched,
> - struct drm_sched_rq *rq)
> +drm_sched_rq_select_entity(struct drm_gpu_scheduler *sched)
> {
> + struct drm_sched_rq *rq = &sched->rq;
> struct rb_node *rb;
>
> spin_lock(&rq->lock);
> diff --git a/include/drm/gpu_scheduler.h
> b/include/drm/gpu_scheduler.h
> index e9ff24c076aa..fd488ccece9a 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -242,7 +242,6 @@ struct drm_sched_entity {
> /**
> * struct drm_sched_rq - queue of entities to be scheduled.
> *
> - * @sched: the scheduler to which this rq belongs to.
> * @lock: protects @entities, @rb_tree_root and @rr_deadline.
> * @entities: list of the entities to be scheduled.
> * @rb_tree_root: root of time based priority queue of entities for
> FIFO scheduling
> @@ -252,8 +251,6 @@ struct drm_sched_entity {
> * the next entity to emit commands from.
> */
> struct drm_sched_rq {
> - struct drm_gpu_scheduler *sched;
> -
> spinlock_t lock;
> /* Following members are protected by the @lock: */
> ktime_t rr_deadline;
> @@ -548,7 +545,7 @@ struct drm_gpu_scheduler {
> atomic_t credit_count;
> long timeout;
> const char *name;
> - struct drm_sched_rq *rq;
> + struct drm_sched_rq rq;
> wait_queue_head_t job_scheduled;
> atomic64_t job_id_count;
> struct workqueue_struct *submit_wq;
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC v4 04/16] drm/sched: Avoid double re-lock on the job free path
2025-05-12 12:49 ` Philipp Stanner
2025-05-12 12:57 ` Matthew Brost
@ 2025-05-14 8:46 ` Tvrtko Ursulin
1 sibling, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-05-14 8:46 UTC (permalink / raw)
To: phasta, amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Matthew Brost
On 12/05/2025 13:49, Philipp Stanner wrote:
> On Fri, 2025-04-25 at 11:20 +0100, Tvrtko Ursulin wrote:
>> Currently the job free work item will lock sched->job_list_lock first
>> time
>> to see if there are any jobs, free a single job, and then lock again
>> to
>> decide whether to re-queue itself if there are more finished jobs.
>>
>> Since drm_sched_get_finished_job() already looks at the second job in
>> the
>> queue we can simply add the signaled check and have it return the
>> presence
>> of more jobs to free to the caller. That way the work item does not
>> have
>> to lock the list again and repeat the signaled check.
>
> Are you convinced that this is worth it?
I cannot see a reason for the lazy code which re-locks only to get the
same boolean state it already peeked at so yes, I am. Maybe CPU vendors
don't mind burning extra cycles to sell us faster chips, I don't know. :D
More interesting angle is that the patch removes the potential
opportunistic signaling from the fence worker (the bad old
evil dma_fence_is_signaled).
> I'm torn. It's rare that one returns a status through a boolean by
> reference. >
> Independently from that, this is a candidate which certainly can be
> branched out from this series, to make the series completely about the
> new scheduling policy, not general other improvements.
If I get an r-b I can easily send it standalone. Until then I let it simmer.
Regards,
Tvrtko
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Danilo Krummrich <dakr@kernel.org>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Philipp Stanner <phasta@kernel.org>
>> ---
>> drivers/gpu/drm/scheduler/sched_main.c | 39 +++++++++++-------------
>> --
>> 1 file changed, 16 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 86e40157b09b..a45b02fd2af3 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -365,22 +365,6 @@ static void __drm_sched_run_free_queue(struct
>> drm_gpu_scheduler *sched)
>> queue_work(sched->submit_wq, &sched->work_free_job);
>> }
>>
>> -/**
>> - * drm_sched_run_free_queue - enqueue free-job work if ready
>> - * @sched: scheduler instance
>> - */
>> -static void drm_sched_run_free_queue(struct drm_gpu_scheduler
>> *sched)
>> -{
>> - struct drm_sched_job *job;
>> -
>> - spin_lock(&sched->job_list_lock);
>> - job = list_first_entry_or_null(&sched->pending_list,
>> - struct drm_sched_job, list);
>> - if (job && dma_fence_is_signaled(&job->s_fence->finished))
>> - __drm_sched_run_free_queue(sched);
>> - spin_unlock(&sched->job_list_lock);
>> -}
>> -
>> /**
>> * drm_sched_job_done - complete a job
>> * @s_job: pointer to the job which is done
>> @@ -1097,12 +1081,13 @@ drm_sched_select_entity(struct
>> drm_gpu_scheduler *sched)
>> * drm_sched_get_finished_job - fetch the next finished job to be
>> destroyed
>> *
>> * @sched: scheduler instance
>> + * @have_more: are there more finished jobs on the list
>> *
>> * Returns the next finished job from the pending list (if there is
>> one)
>> * ready for it to be destroyed.
>> */
>> static struct drm_sched_job *
>> -drm_sched_get_finished_job(struct drm_gpu_scheduler *sched)
>> +drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool
>> *have_more)
>> {
>> struct drm_sched_job *job, *next;
>>
>> @@ -1110,22 +1095,27 @@ drm_sched_get_finished_job(struct
>> drm_gpu_scheduler *sched)
>>
>> job = list_first_entry_or_null(&sched->pending_list,
>> struct drm_sched_job, list);
>> -
>> if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>> /* remove job from pending_list */
>> list_del_init(&job->list);
>>
>> /* cancel this job's TO timer */
>> cancel_delayed_work(&sched->work_tdr);
>> - /* make the scheduled timestamp more accurate */
>> +
>> + *have_more = false;
>> next = list_first_entry_or_null(&sched-
>>> pending_list,
>> typeof(*next),
>> list);
>> -
>> if (next) {
>> + /* make the scheduled timestamp more
>> accurate */
>> if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
>> &next->s_fence-
>>> scheduled.flags))
>> next->s_fence->scheduled.timestamp =
>> dma_fence_timestamp(&job-
>>> s_fence->finished);
>> +
>> + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
>> + &next->s_fence-
>>> finished.flags))
>> + *have_more = true;
>> +
>> /* start TO timer for next job */
>> drm_sched_start_timeout(sched);
>> }
>> @@ -1184,12 +1174,15 @@ static void drm_sched_free_job_work(struct
>> work_struct *w)
>> struct drm_gpu_scheduler *sched =
>> container_of(w, struct drm_gpu_scheduler,
>> work_free_job);
>> struct drm_sched_job *job;
>> + bool have_more;
>>
>> - job = drm_sched_get_finished_job(sched);
>> - if (job)
>> + job = drm_sched_get_finished_job(sched, &have_more);
>> + if (job) {
>> sched->ops->free_job(job);
>> + if (have_more)
>> + __drm_sched_run_free_queue(sched);
>> + }
>>
>> - drm_sched_run_free_queue(sched);
>> drm_sched_run_job_queue(sched);
>> }
>>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC v4 04/16] drm/sched: Avoid double re-lock on the job free path
2025-05-12 12:57 ` Matthew Brost
@ 2025-05-14 8:54 ` Tvrtko Ursulin
0 siblings, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-05-14 8:54 UTC (permalink / raw)
To: Matthew Brost, phasta
Cc: amd-gfx, dri-devel, kernel-dev, Christian König,
Danilo Krummrich
On 12/05/2025 13:57, Matthew Brost wrote:
> On Mon, May 12, 2025 at 02:49:55PM +0200, Philipp Stanner wrote:
>> On Fri, 2025-04-25 at 11:20 +0100, Tvrtko Ursulin wrote:
>>> Currently the job free work item will lock sched->job_list_lock first
>>> time
>>> to see if there are any jobs, free a single job, and then lock again
>>> to
>>> decide whether to re-queue itself if there are more finished jobs.
>>>
>>> Since drm_sched_get_finished_job() already looks at the second job in
>>> the
>>> queue we can simply add the signaled check and have it return the
>>> presence
>>> of more jobs to free to the caller. That way the work item does not
>>> have
>>> to lock the list again and repeat the signaled check.
>>
>> Are you convinced that this is worth it?
>>
>> I'm torn. It's rare that one returns a status through a boolean by
>> reference.
>>
>
> I'd say no to this (mirco optimization) and to freeing / running more
It would be nice if the "no" came with some explanation.
> than job per worker invocation. The later was rejected in original work
> queue conversion.
This applies to two other patches from the series.
For sched->credit_limit, I could limit it to batches or via
cond_resched() perhaps, if that is your concern.
Although TBH for 1:1 drivers like xe (with large-ish credit_limit) I
would have thought you would actually be extra motivated to pass along
as much as possible to the GuC, as soon as possible, and not rely on
work items re-queues.
For freeing in batches, I need it for more accurate GPU utilisation
stats. What reason you see for that to be problematic?
Regards,
Tvrtko
>> Independently from that, this is a candidate which certainly can be
>> branched out from this series, to make the series completely about the
>> new scheduling policy, not general other improvements.
>>
>>
>> P.
>>
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Danilo Krummrich <dakr@kernel.org>
>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>> Cc: Philipp Stanner <phasta@kernel.org>
>>> ---
>>> drivers/gpu/drm/scheduler/sched_main.c | 39 +++++++++++-------------
>>> --
>>> 1 file changed, 16 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 86e40157b09b..a45b02fd2af3 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -365,22 +365,6 @@ static void __drm_sched_run_free_queue(struct
>>> drm_gpu_scheduler *sched)
>>> queue_work(sched->submit_wq, &sched->work_free_job);
>>> }
>>>
>>> -/**
>>> - * drm_sched_run_free_queue - enqueue free-job work if ready
>>> - * @sched: scheduler instance
>>> - */
>>> -static void drm_sched_run_free_queue(struct drm_gpu_scheduler
>>> *sched)
>>> -{
>>> - struct drm_sched_job *job;
>>> -
>>> - spin_lock(&sched->job_list_lock);
>>> - job = list_first_entry_or_null(&sched->pending_list,
>>> - struct drm_sched_job, list);
>>> - if (job && dma_fence_is_signaled(&job->s_fence->finished))
>>> - __drm_sched_run_free_queue(sched);
>>> - spin_unlock(&sched->job_list_lock);
>>> -}
>>> -
>>> /**
>>> * drm_sched_job_done - complete a job
>>> * @s_job: pointer to the job which is done
>>> @@ -1097,12 +1081,13 @@ drm_sched_select_entity(struct
>>> drm_gpu_scheduler *sched)
>>> * drm_sched_get_finished_job - fetch the next finished job to be
>>> destroyed
>>> *
>>> * @sched: scheduler instance
>>> + * @have_more: are there more finished jobs on the list
>>> *
>>> * Returns the next finished job from the pending list (if there is
>>> one)
>>> * ready for it to be destroyed.
>>> */
>>> static struct drm_sched_job *
>>> -drm_sched_get_finished_job(struct drm_gpu_scheduler *sched)
>>> +drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool
>>> *have_more)
>>> {
>>> struct drm_sched_job *job, *next;
>>>
>>> @@ -1110,22 +1095,27 @@ drm_sched_get_finished_job(struct
>>> drm_gpu_scheduler *sched)
>>>
>>> job = list_first_entry_or_null(&sched->pending_list,
>>> struct drm_sched_job, list);
>>> -
>>> if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>> /* remove job from pending_list */
>>> list_del_init(&job->list);
>>>
>>> /* cancel this job's TO timer */
>>> cancel_delayed_work(&sched->work_tdr);
>>> - /* make the scheduled timestamp more accurate */
>>> +
>>> + *have_more = false;
>>> next = list_first_entry_or_null(&sched-
>>>> pending_list,
>>> typeof(*next),
>>> list);
>>> -
>>> if (next) {
>>> + /* make the scheduled timestamp more
>>> accurate */
>>> if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
>>> &next->s_fence-
>>>> scheduled.flags))
>>> next->s_fence->scheduled.timestamp =
>>> dma_fence_timestamp(&job-
>>>> s_fence->finished);
>>> +
>>> + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
>>> + &next->s_fence-
>>>> finished.flags))
>>> + *have_more = true;
>>> +
>>> /* start TO timer for next job */
>>> drm_sched_start_timeout(sched);
>>> }
>>> @@ -1184,12 +1174,15 @@ static void drm_sched_free_job_work(struct
>>> work_struct *w)
>>> struct drm_gpu_scheduler *sched =
>>> container_of(w, struct drm_gpu_scheduler,
>>> work_free_job);
>>> struct drm_sched_job *job;
>>> + bool have_more;
>>>
>>> - job = drm_sched_get_finished_job(sched);
>>> - if (job)
>>> + job = drm_sched_get_finished_job(sched, &have_more);
>>> + if (job) {
>>> sched->ops->free_job(job);
>>> + if (have_more)
>>> + __drm_sched_run_free_queue(sched);
>>> + }
>>>
>>> - drm_sched_run_free_queue(sched);
>>> drm_sched_run_job_queue(sched);
>>> }
>>>
>>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC v4 05/16] drm/sched: Consolidate drm_sched_job_timedout
2025-05-12 12:53 ` Philipp Stanner
@ 2025-05-14 8:57 ` Tvrtko Ursulin
0 siblings, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-05-14 8:57 UTC (permalink / raw)
To: phasta, amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Matthew Brost,
Maíra Canal
On 12/05/2025 13:53, Philipp Stanner wrote:
> On Fri, 2025-04-25 at 11:20 +0100, Tvrtko Ursulin wrote:
>> Reduce to one spin_unlock for hopefully a little bit clearer flow in
>> the
>> function. It may appear that there is a behavioural change with the
>> drm_sched_start_timeout_unlocked() now not being called if there were
>> initially no jobs on the pending list, and then some appeared after
>> unlock, however if the code would rely on the TDR handler restarting
>> itself then it would fail to do that if the job arrived on the
>> pending
>> list after the check.
>>
>> Also fix one stale comment while touching the function.
>
> Same here, that's a good candidate for a separate patch / series.
It conflicts with the in progress work from Maíra (fixing memory leaks
on false timeouts) so I will keep this one on the back-burner until her
work lands.
Regards,
Tvrtko
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Danilo Krummrich <dakr@kernel.org>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Philipp Stanner <phasta@kernel.org>
>> ---
>> drivers/gpu/drm/scheduler/sched_main.c | 37 +++++++++++++-----------
>> --
>> 1 file changed, 18 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index a45b02fd2af3..a26cc11c8ade 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -516,38 +516,37 @@ static void drm_sched_job_begin(struct
>> drm_sched_job *s_job)
>>
>> static void drm_sched_job_timedout(struct work_struct *work)
>> {
>> - struct drm_gpu_scheduler *sched;
>> + struct drm_gpu_scheduler *sched =
>> + container_of(work, struct drm_gpu_scheduler,
>> work_tdr.work);
>> + enum drm_gpu_sched_stat status;
>> struct drm_sched_job *job;
>> - enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL;
>> -
>> - sched = container_of(work, struct drm_gpu_scheduler,
>> work_tdr.work);
>>
>> /* Protects against concurrent deletion in
>> drm_sched_get_finished_job */
>> spin_lock(&sched->job_list_lock);
>> job = list_first_entry_or_null(&sched->pending_list,
>> struct drm_sched_job, list);
>> -
>> if (job) {
>> /*
>> * Remove the bad job so it cannot be freed by
>> concurrent
>> - * drm_sched_cleanup_jobs. It will be reinserted
>> back after sched->thread
>> - * is parked at which point it's safe.
>> + * drm_sched_get_finished_job. It will be reinserted
>> back after
>> + * scheduler worker is stopped at which point it's
>> safe.
>> */
>> list_del_init(&job->list);
>> - spin_unlock(&sched->job_list_lock);
>> + }
>> + spin_unlock(&sched->job_list_lock);
>>
>> - status = job->sched->ops->timedout_job(job);
>> + if (!job)
>> + return;
>>
>> - /*
>> - * Guilty job did complete and hence needs to be
>> manually removed
>> - * See drm_sched_stop doc.
>> - */
>> - if (sched->free_guilty) {
>> - job->sched->ops->free_job(job);
>> - sched->free_guilty = false;
>> - }
>> - } else {
>> - spin_unlock(&sched->job_list_lock);
>> + status = job->sched->ops->timedout_job(job);
>> +
>> + /*
>> + * Guilty job did complete and hence needs to be manually
>> removed. See
>> + * documentation for drm_sched_stop.
>> + */
>> + if (sched->free_guilty) {
>> + job->sched->ops->free_job(job);
>> + sched->free_guilty = false;
>> }
>>
>> if (status != DRM_GPU_SCHED_STAT_ENODEV)
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC v4 10/16] drm/sched: Free all finished jobs at once
2025-05-12 12:56 ` Philipp Stanner
@ 2025-05-14 9:00 ` Tvrtko Ursulin
0 siblings, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-05-14 9:00 UTC (permalink / raw)
To: phasta, amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Matthew Brost
On 12/05/2025 13:56, Philipp Stanner wrote:
> On Fri, 2025-04-25 at 11:20 +0100, Tvrtko Ursulin wrote:
>> To implement fair scheduling we will need as accurate as possible
>> view
>> into per entity GPU time utilisation. Because sched fence execution
>> time
>> are only adjusted for accuracy in the free worker we need to process
>> completed jobs as soon as possible so the metric is most up to date
>> when
>> view from the submission side of things.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Danilo Krummrich <dakr@kernel.org>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Philipp Stanner <phasta@kernel.org>
>> ---
>> drivers/gpu/drm/scheduler/sched_main.c | 15 ++-------------
>> 1 file changed, 2 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 8950c7705f57..22428a1569dd 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -865,13 +865,12 @@ drm_sched_select_entity(struct
>> drm_gpu_scheduler *sched)
>> * drm_sched_get_finished_job - fetch the next finished job to be
>> destroyed
>> *
>> * @sched: scheduler instance
>> - * @have_more: are there more finished jobs on the list
>> *
>> * Returns the next finished job from the pending list (if there is
>> one)
>> * ready for it to be destroyed.
>> */
>> static struct drm_sched_job *
>> -drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool
>> *have_more)
>> +drm_sched_get_finished_job(struct drm_gpu_scheduler *sched)
>> {
>> struct drm_sched_job *job, *next;
>>
>> @@ -886,7 +885,6 @@ drm_sched_get_finished_job(struct
>> drm_gpu_scheduler *sched, bool *have_more)
>> /* cancel this job's TO timer */
>> cancel_delayed_work(&sched->work_tdr);
>>
>> - *have_more = false;
>> next = list_first_entry_or_null(&sched-
>>> pending_list,
>> typeof(*next),
>> list);
>> if (next) {
>> @@ -896,10 +894,6 @@ drm_sched_get_finished_job(struct
>> drm_gpu_scheduler *sched, bool *have_more)
>> next->s_fence->scheduled.timestamp =
>> dma_fence_timestamp(&job-
>>> s_fence->finished);
>>
>> - if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
>> - &next->s_fence-
>>> finished.flags))
>> - *have_more = true;
>> -
>> /* start TO timer for next job */
>> drm_sched_start_timeout(sched);
>> }
>> @@ -958,14 +952,9 @@ static void drm_sched_free_job_work(struct
>> work_struct *w)
>> struct drm_gpu_scheduler *sched =
>> container_of(w, struct drm_gpu_scheduler,
>> work_free_job);
>> struct drm_sched_job *job;
>> - bool have_more;
>>
>> - job = drm_sched_get_finished_job(sched, &have_more);
>> - if (job) {
>> + while ((job = drm_sched_get_finished_job(sched)))
>> sched->ops->free_job(job);
>> - if (have_more)
>> - __drm_sched_run_free_queue(sched);
>> - }
>
> Are there any have_more users left after that?
>
> Removing here what was added before IMO makes it more questionable
> adding that improvement in the first place.
Yep, it is definitely not typical to add and then remove stuff in the
same series. Reason is series was not intended (or expected) to get
accepted as one. I was expecting easy cleanups to get in fast and the
rest to keep iterating for who knows how long.
Regards,
Tvrtko
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC v4 12/16] drm/sched: Remove idle entity from tree
2025-05-12 13:03 ` Philipp Stanner
@ 2025-05-14 9:22 ` Tvrtko Ursulin
0 siblings, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2025-05-14 9:22 UTC (permalink / raw)
To: phasta, amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Matthew Brost
On 12/05/2025 14:03, Philipp Stanner wrote:
> On Fri, 2025-04-25 at 11:20 +0100, Tvrtko Ursulin wrote:
>> There is no need to keep entities with no jobs in the tree so lets
>> remove
>> it once the last job is consumed. This keeps the tree smaller which
>> is
>> nicer and more efficient as entities are removed and re-added on
>> every
>> popped job.
>
> That there is no need to do so doesn't imply that you can't keep them
> around. The commit message doesn't make the motivation clear
>
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Danilo Krummrich <dakr@kernel.org>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Philipp Stanner <phasta@kernel.org>
>> ---
>> drivers/gpu/drm/scheduler/sched_rq.c | 24 +++++++++++++-----------
>> 1 file changed, 13 insertions(+), 11 deletions(-)
>
> Since this doesn't simplify the code base, I think the only
> justification would be a somewhat decent performance gain. Does this
> patch result in that?
>
> Otherwise it's probably better to keep git-blame intact here.
I needed this for one of the earlier approaches and I *think* what
remains with the latest is just the fact it makes the run-queue contain
only runnable entities (which makes sense and is logical; run-queue <->
runnable). And that rb-tree re-balancing is cheaper with smaller trees
but in the grand scheme of things it is not something I even considered
attempting to measure.
I will re-consider the fate of this patch once more feedback on the
series as overall is received. Until then I don't think it makes sense
to churn it.
Btw another angle to this, which we touched upon with Christian before
is, if we end up not pruning the tree from unrunnable entities, then we
could drop the drm_sched_rq->entities list. Making a handful of caller
which walk it walk the tree instead.
Regards,
Tvrtko
>> diff --git a/drivers/gpu/drm/scheduler/sched_rq.c
>> b/drivers/gpu/drm/scheduler/sched_rq.c
>> index d477a027feb9..2cde89cf25fb 100644
>> --- a/drivers/gpu/drm/scheduler/sched_rq.c
>> +++ b/drivers/gpu/drm/scheduler/sched_rq.c
>> @@ -149,25 +149,27 @@ void drm_sched_rq_pop_entity(struct
>> drm_sched_entity *entity)
>> {
>> struct drm_sched_job *next_job;
>> struct drm_sched_rq *rq;
>> - ktime_t ts;
>>
>> /*
>> * Update the entity's location in the min heap according to
>> * the timestamp of the next job, if any.
>> */
>> + spin_lock(&entity->lock);
>> + rq = entity->rq;
>> + spin_lock(&rq->lock);
>> next_job = drm_sched_entity_queue_peek(entity);
>> - if (!next_job)
>> - return;
>> + if (next_job) {
>> + ktime_t ts;
>>
>> - if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
>> - ts = next_job->submit_ts;
>> - else
>> - ts = drm_sched_rq_get_rr_deadline(rq);
>> + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
>> + ts = next_job->submit_ts;
>> + else
>> + ts = drm_sched_rq_get_rr_deadline(rq);
>>
>> - spin_lock(&entity->lock);
>> - rq = entity->rq;
>> - spin_lock(&rq->lock);
>> - drm_sched_rq_update_fifo_locked(entity, rq, ts);
>> + drm_sched_rq_update_fifo_locked(entity, rq, ts);
>> + } else {
>> + drm_sched_rq_remove_fifo_locked(entity, rq);
>> + }
>> spin_unlock(&rq->lock);
>> spin_unlock(&entity->lock);
>> }
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC v4 00/16] Fair DRM scheduler
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
` (16 preceding siblings ...)
2025-04-29 7:25 ` [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
@ 2025-05-19 16:51 ` Pierre-Eric Pelloux-Prayer
17 siblings, 0 replies; 35+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2025-05-19 16:51 UTC (permalink / raw)
To: Tvrtko Ursulin, amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Leo Liu,
Matthew Brost, Philipp Stanner, Pierre-Eric Pelloux-Prayer,
Michel Dänzer
Hi,
Le 25/04/2025 à 12:20, Tvrtko Ursulin a écrit :
> V4 is quite different from v3 in that I have replaced the deadline + queue-depth
> approach with a fair GPU time based approach. This is because Pierre-Eric found
> a viewperf workload which showed queue-depth based approach regressing and
> without it there was a regression on one of my synthetic workloads I was not
> happy with.
I've did some testing with this version and the regression I had is gone.
This branch performs better than drm-next and a bit worse than my own hacky implementation of a CFS
scheduler.
For context my test case is: running glxgears and a GPU-heavy Viewperf test, and then monitoring the
FPS of glxgears. The closest to 60 Hz glxgears runs, the better because it means the scheduler
successfully ran glxgears jobs between the multiple jobs that build a single Viewperf frame.
>
> In my experiments the fair scheduler looks solid so lets see how it fares after
> wider testing.
>
> On the high level main advantages of the series are:
>
> 1. Scheduling quality - schedules better than FIFO.
> 2. Code simplification - no more multiple run queues.
>
> First patches add some unit tests which allow for easy evaluation of scheduling
> behaviour against different client submission patterns. From there onwards it is
> hopefully a natural progression of cleanups, enablers, adding the fair policy,
> and finally removing FIFO and RR and simplifying the code base due not more need
> for multiple run queues.
>
> As a headline result I have tested three simultaneous clients on the Steam Deck:
>
> One instance of a deferredmultisampling Vulkan demo running with low priority,
> one normal priority instance of the same demo, and the Unigine Heaven benchmark.
>
> With the FIFO scheduler we can see that the low priority client is completely
> starved and the GPU time distribution between the other two clients is uneven:
>
> https://people.igalia.com/tursulin/drm-sched-fair/fifo-starvation.png
>
> Switching to the fair scheduler, GPU time distribution is almost equal and the
> low priority client does get a small share of the GPU:
>
> https://people.igalia.com/tursulin/drm-sched-fair/fair-no-starvation.png
>
> Moving onto the synthetic submission patterns, they are about two simultaneous
> clients which broadly cover the following categories:
>
> * Deep queue clients
> * Hogs versus interactive
> * Priority handling
>
> Lets look at the results:
>
> 1. Two normal priority deep queue clients.
>
> These ones submit one second worth of 8ms jobs. As fast as they can, no
> dependencies etc. There is no difference in runtime between FIFO and fair but
> the latter allows both clients to progress with work more evenly:
>
> https://people.igalia.com/tursulin/drm-sched-fair/normal-normal.png
>
> (X axis is time, Y is submitted queue-depth, hence lowering of qd corresponds
> with work progress for both clients, tested with both schedulers separately.)
>
> 2. Same two clients but one is now low priority.
>
> https://people.igalia.com/tursulin/drm-sched-fair/normal-low.png
>
> Normal priority client is a solid line, low priority dotted. We can see how FIFO
> completely starves the low priority client until the normal priority is fully
> done. Only then the low priority client gets any GPU time.
>
> In constrast, fair scheduler allows some GPU time to the low priority client.
>
> 3. Same clients but now high versus normal priority.
>
> Similar behaviour as in the previous one with normal a bit less de-prioritised
> relative to high, than low was against normal.
>
> https://people.igalia.com/tursulin/drm-sched-fair/high-normal.png
>
> 4. Heavy load vs interactive client.
>
> Heavy client emits a 75% GPU load in the format of 3x 2.5ms jobs followed by a
> 2.5ms wait. Interactive client emits a 10% GPU load in the format of 1x 1ms job
> followed by a 9ms wait.
>
> This simulates an interactive graphical client used on top of a relatively heavy
> background load but no GPU oversubscription.
>
> Graphs show the interactive client only and from now on, instead of looking at
> the client's queue depth, we look at its "fps".
>
> https://people.igalia.com/tursulin/drm-sched-fair/heavy-interactive.png
>
> We can see that fair scheduler allows a higher fps for the interactive client
> which is good.
>
> 5. An even heavier load vs interactive client.
>
> This one is oversubscribing the GPU by submitting 4x 50ms jobs and waiting for
> only one microsecond before repeating the cycle. Interactive client is thje same
> 10% as above.
>
> https://people.igalia.com/tursulin/drm-sched-fair/veryheavy-interactive.png
>
> Here the difference is even more dramatic with fair scheduler enabling ~3x the
> framerate for the interactive client.
>
> 6. Low priority GPU hog versus heavy-interactive.
>
> Low priority client: 3x 2.5ms jobs client followed by a 0.5ms wait.
> Interactive client: 1x 0.5ms job followed by a 10ms wait.
>
> https://people.igalia.com/tursulin/drm-sched-fair/lowhog-interactive.png
>
> Slight win for the fair scheduler but could be just noise.
>
> 7. Last set of test scenarios will have three subgroups.
>
> In all cases we have two interactive (synchronous, single job at a time) clients
> with a 50% "duty cycle" GPU time usage.
>
> Client 1: 1.5ms job + 1.5ms wait (aka short bursty)
> Client 2: 2.5ms job + 2.5ms wait (aka long bursty)
>
> a) Both normal priority.
>
> https://people.igalia.com/tursulin/drm-sched-fair/5050-short.png
> https://people.igalia.com/tursulin/drm-sched-fair/5050-long.png
>
> Both schedulers favour the higher frequency duty cycle with fair giving it a
> little bit more which should be good for interactivity.
>
> b) Normal vs low priority.
>
> https://people.igalia.com/tursulin/drm-sched-fair/5050-normal-low-normal.png
> https://people.igalia.com/tursulin/drm-sched-fair/5050-normal-low-low.png
>
> Fair scheduler gives a bit more GPU time to the normal priority client which is
> again good.
>
> c) High vs normal priority.
>
> https://people.igalia.com/tursulin/drm-sched-fair/5050-high-normal-high.png
> https://people.igalia.com/tursulin/drm-sched-fair/5050-high-normal-normal.png
>
> Again, fair scheduler gives a bit more share to the higher priority client.
>
> On the overall fair looks like a potential improvement in terms of fairness,
> especially avoiding priority starvation. There do not appear to be any
> regressions with the tested workloads.
>
> As before, I am looking for feedback, ideas for what kind of submission
> scenarios to test. Testers on different GPUs would be very welcome too.
There's room for improvement (eg: I believe the scheduler would benefit from having a logic to
postpone the entity selection when it detects that one app will oversubscribe the GPU), but
overall if no regressions are found I think this series is already a solid first step.
Thanks,
Pierre-Eric
>
> And I should probably test round-robin at some point, to see if we are maybe
> okay to drop unconditionally, it or further work improving fair would be needed
> if some use cases rely on round-robin.
>
> v2:
> * Fixed many rebase errors.
> * Added some new patches.
> * Dropped single shot dependecy handling.
>
> v3:
> * Added scheduling quality unit tests.
> * Refined a tiny bit by adding some fairness.
> * Dropped a few patches for now.
>
> v4:
> * Replaced deadline with fair!
> * Refined scheduling quality unit tests.
> * Pulled one cleanup patch earlier.
> * Fixed "drm/sched: Avoid double re-lock on the job free path".
>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> CC: Leo Liu <Leo.Liu@amd.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> Cc: Michel Dänzer <michel.daenzer@mailbox.org>
>
> Tvrtko Ursulin (16):
> drm/sched: Add some scheduling quality unit tests
> drm/sched: Add some more scheduling quality unit tests
> drm/sched: De-clutter drm_sched_init
> drm/sched: Avoid double re-lock on the job free path
> drm/sched: Consolidate drm_sched_job_timedout
> drm/sched: Consolidate drm_sched_rq_select_entity_rr
> drm/sched: Implement RR via FIFO
> drm/sched: Consolidate entity run queue management
> drm/sched: Move run queue related code into a separate file
> drm/sched: Free all finished jobs at once
> drm/sched: Account entity GPU time
> drm/sched: Remove idle entity from tree
> drm/sched: Add fair scheduling policy
> drm/sched: Remove FIFO and RR and simplify to a single run queue
> drm/sched: Queue all free credits in one worker invocation
> drm/sched: Embed run queue singleton into the scheduler
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 27 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 5 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 8 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 8 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 8 +-
> drivers/gpu/drm/scheduler/Makefile | 2 +-
> drivers/gpu/drm/scheduler/sched_entity.c | 121 +--
> drivers/gpu/drm/scheduler/sched_fence.c | 2 +-
> drivers/gpu/drm/scheduler/sched_internal.h | 114 ++-
> drivers/gpu/drm/scheduler/sched_main.c | 570 +++---------
> drivers/gpu/drm/scheduler/sched_rq.c | 214 +++++
> drivers/gpu/drm/scheduler/tests/Makefile | 3 +-
> .../gpu/drm/scheduler/tests/tests_scheduler.c | 815 ++++++++++++++++++
> include/drm/gpu_scheduler.h | 23 +-
> 15 files changed, 1348 insertions(+), 578 deletions(-)
> create mode 100644 drivers/gpu/drm/scheduler/sched_rq.c
> create mode 100644 drivers/gpu/drm/scheduler/tests/tests_scheduler.c
>
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2025-05-19 16:53 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-25 10:20 [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 01/16] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
2025-04-29 15:03 ` Christian König
2025-04-29 15:45 ` Michel Dänzer
2025-04-29 15:52 ` Christian König
2025-04-25 10:20 ` [RFC v4 02/16] drm/sched: Add some more " Tvrtko Ursulin
2025-04-29 15:07 ` Christian König
2025-04-25 10:20 ` [RFC v4 03/16] drm/sched: De-clutter drm_sched_init Tvrtko Ursulin
2025-04-29 15:16 ` Christian König
2025-04-25 10:20 ` [RFC v4 04/16] drm/sched: Avoid double re-lock on the job free path Tvrtko Ursulin
2025-05-12 12:49 ` Philipp Stanner
2025-05-12 12:57 ` Matthew Brost
2025-05-14 8:54 ` Tvrtko Ursulin
2025-05-14 8:46 ` Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 05/16] drm/sched: Consolidate drm_sched_job_timedout Tvrtko Ursulin
2025-05-12 12:53 ` Philipp Stanner
2025-05-14 8:57 ` Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 06/16] drm/sched: Consolidate drm_sched_rq_select_entity_rr Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 07/16] drm/sched: Implement RR via FIFO Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 08/16] drm/sched: Consolidate entity run queue management Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 09/16] drm/sched: Move run queue related code into a separate file Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 10/16] drm/sched: Free all finished jobs at once Tvrtko Ursulin
2025-05-12 12:56 ` Philipp Stanner
2025-05-14 9:00 ` Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 11/16] drm/sched: Account entity GPU time Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 12/16] drm/sched: Remove idle entity from tree Tvrtko Ursulin
2025-05-12 13:03 ` Philipp Stanner
2025-05-14 9:22 ` Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 13/16] drm/sched: Add fair scheduling policy Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 14/16] drm/sched: Remove FIFO and RR and simplify to a single run queue Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 15/16] drm/sched: Queue all free credits in one worker invocation Tvrtko Ursulin
2025-04-25 10:20 ` [RFC v4 16/16] drm/sched: Embed run queue singleton into the scheduler Tvrtko Ursulin
2025-05-12 13:05 ` Philipp Stanner
2025-04-29 7:25 ` [RFC v4 00/16] Fair DRM scheduler Tvrtko Ursulin
2025-05-19 16:51 ` Pierre-Eric Pelloux-Prayer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).