Re: [RFC PATCH] drm/sched: Add new KUnit test suite for concurrent job submission

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Marco Pagani <marco.pagani@linux.dev>
To: tursulin@ursulin.net
Cc: airlied@gmail.com, ckoenig.leichtzumerken@gmail.com,
	dakr@kernel.org, dri-devel@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, maarten.lankhorst@linux.intel.com,
	marpagan@redhat.com, matthew.brost@intel.com, mripard@kernel.org,
	phasta@kernel.org, simona@ffwll.ch, tzimmermann@suse.de
Subject: Re: [RFC PATCH] drm/sched: Add new KUnit test suite for concurrent job submission
Date: Wed, 28 Jan 2026 12:39:13 +0100	[thread overview]
Message-ID: <90cdb121-7ff2-43a5-9327-2898b5e65609@linux.dev> (raw)
In-Reply-To: <79cf2013-da6b-4653-aaa8-3e29a7d1ee3a@ursulin.net>


On 22/01/2026 10:51, Tvrtko Ursulin wrote:
> 
> On 20/01/2026 20:52, Marco Pagani wrote:
>> Add a new test suite to simulate concurrent job submissions from userspace.
>> The suite includes a basic test case where each worker submits a single
>> job, and a more advanced case involving the submission of multiple jobs.
> 
> New test coverage is welcome!

Hi Tvrtko, Philip, and thank you.

> But as Philipp has said some more context would be beneficial. Like are 
> you trying to hit a bug, or extend later with something which will hit a 
> bug and then you will propose improvements? Or simply improving the 
> coverage?

Sure, I'll extend the commit message to be more descriptive in the next
version.
 
> If it is about some race I would maybe consider putting this into a new 
> tests_races.c. I actually have this file locally and some unfinished 
> test cases already, although it is unclear when I will be happy with 
> them to post. But if the test is simply about adding coverage it is fine 
> to live in tests_basic.c.

The general idea is to extend the suite with some initial tests that stress
concurrency to spot race conditions. Having these initial tests grouped together
with future ones in a new tests_races.c file makes perfect sense to me.

>> Signed-off-by: Marco Pagani <marpagan@redhat.com>
>> ---
>>   drivers/gpu/drm/scheduler/tests/tests_basic.c | 175 ++++++++++++++++++
>>   1 file changed, 175 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c b/drivers/gpu/drm/scheduler/tests/tests_basic.c
>> index 82a41a456b0a..7c25bcbbe7c9 100644
>> --- a/drivers/gpu/drm/scheduler/tests/tests_basic.c
>> +++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c
>> @@ -2,6 +2,7 @@
>>   /* Copyright (c) 2025 Valve Corporation */
>>   
>>   #include <linux/delay.h>
>> +#include <linux/completion.h>
>>   
>>   #include "sched_tests.h"
>>   
>> @@ -235,6 +236,179 @@ static void drm_sched_basic_cancel(struct kunit *test)
>>   	KUNIT_ASSERT_EQ(test, job->hw_fence.error, -ECANCELED);
>>   }
>>   
>> +struct sched_concurrent_test_context {
>> +	struct drm_mock_scheduler *sched;
>> +	struct workqueue_struct *sub_wq;
>> +	struct completion wait_go;
>> +};
>> +
>> +KUNIT_DEFINE_ACTION_WRAPPER(destroy_workqueue_wrap, destroy_workqueue,
>> +			    struct workqueue_struct *);
>> +
>> +KUNIT_DEFINE_ACTION_WRAPPER(drm_mock_sched_fini_wrap, drm_mock_sched_fini,
>> +			    struct drm_mock_scheduler *);
>> +
>> +KUNIT_DEFINE_ACTION_WRAPPER(drm_mock_sched_entity_free_wrap, drm_mock_sched_entity_free,
>> +			    struct drm_mock_sched_entity *);
>> +
>> +static int drm_sched_concurrent_init(struct kunit *test)
>> +{
>> +	struct sched_concurrent_test_context *ctx;
>> +	int ret;
>> +
>> +	ctx = kunit_kzalloc(test, sizeof(*ctx), GFP_KERNEL);
>> +
>> +	init_completion(&ctx->wait_go);
>> +
>> +	ctx->sched = drm_mock_sched_new(test, MAX_SCHEDULE_TIMEOUT);
>> +
>> +	ret = kunit_add_action_or_reset(test, drm_mock_sched_fini_wrap, ctx->sched);
>> +	KUNIT_ASSERT_EQ(test, ret, 0);
>> +
>> +	/* Use an unbounded workqueue to maximize job submission concurrency */
>> +	ctx->sub_wq = alloc_workqueue("drm-sched-submitters-wq", WQ_UNBOUND,
>> +				      WQ_UNBOUND_MAX_ACTIVE);
>> +	KUNIT_ASSERT_NOT_NULL(test, ctx->sub_wq);
>> +
>> +	ret = kunit_add_action_or_reset(test, destroy_workqueue_wrap, ctx->sub_wq);
>> +	KUNIT_ASSERT_EQ(test, ret, 0);
>> +
>> +	test->priv = ctx;
>> +
>> +	return 0;
>> +}
>> +
>> +struct drm_sched_concurrent_params {
>> +	const char *description;
>> +	unsigned int job_base_us;
>> +	unsigned int num_jobs;
>> +	unsigned int num_subs;
>> +};
>> +
>> +static const struct drm_sched_concurrent_params drm_sched_concurrent_cases[] = {
>> +	{
>> +		.description = "Concurrently submit a single job in a single entity",
>> +		.job_base_us = 1000,
>> +		.num_jobs = 1,
>> +		.num_subs = 32,
>> +	},
> 
> Why is submission from a single thread interesting if it is already covered?

These two initial parameter sets cover only concurrent submission:
multiple submitters, single job / multiple submitters, multiple jobs.

>> +	{
>> +		.description = "Concurrently submit multiple jobs in a single entity",
>> +		.job_base_us = 1000,
>> +		.num_jobs = 10,
>> +		.num_subs = 64,
>> +	},
>> +};
>> +
>> +static void
>> +drm_sched_concurrent_desc(const struct drm_sched_concurrent_params *params, char *desc)
>> +{
>> +	strscpy(desc, params->description, KUNIT_PARAM_DESC_SIZE);
>> +}
>> +
>> +KUNIT_ARRAY_PARAM(drm_sched_concurrent, drm_sched_concurrent_cases, drm_sched_concurrent_desc);
>> +
>> +struct submitter_data {
>> +	struct work_struct work;
>> +	struct sched_concurrent_test_context *ctx;
>> +	struct drm_mock_sched_entity *entity;
>> +	struct drm_mock_sched_job **jobs;
>> +	struct kunit *test;
>> +	unsigned int id;
>> +	bool timedout;
>> +};
>> +
>> +static void drm_sched_submitter_worker(struct work_struct *work)
>> +{
>> +	const struct drm_sched_concurrent_params *params;
>> +	struct sched_concurrent_test_context *ctx;
>> +	struct submitter_data *sub_data;
>> +	unsigned int i, duration_us;
>> +	unsigned long timeout_jiffies;
>> +	bool done;
>> +
>> +	sub_data = container_of(work, struct submitter_data, work);
>> +	ctx = sub_data->ctx;
>> +	params = sub_data->test->param_value;
>> +
>> +	wait_for_completion(&ctx->wait_go);
>> +
>> +	for (i = 0; i < params->num_jobs; i++) {
>> +		duration_us = params->job_base_us + (sub_data->id * 10);
> 
> Why is job duration dependent by the submitter id?

Just a simple way to have a deterministic distribution of durations.
I can change it if it doesn't fit.

> Would it be feasiable to not use auto-completing jobs and instead 
> advance the timeline manually? Given how the premise of the test seems 
> to be about concurrent submission it sounds plausible that what happens 
> after submission maybe isn't very relevant.

Good idea! I'll run some experiments and see if it works.

>> +		drm_mock_sched_job_set_duration_us(sub_data->jobs[i], duration_us);
>> +		drm_mock_sched_job_submit(sub_data->jobs[i]);
> 
> On a related note, one interesting thing to add coverage for later is 
> multi-threaded submit of multiple jobs against a single entity. But it 
> is not an immediate need. Just mentioning it as something interesting.

Currently, the test configures each submitter to submit multiple jobs
against its own dedicated entity. I considered adding a test case for
submitting multiple jobs against multiple entities, but I decided to
leave it for the future.

> It would mean open coding drm_mock_sched_job_submit() as 
> drm_sched_job_arm() and drm_sched_entity_push_job() and sticking some 
> delay in between so two threads have the chance to interleave. Mock 
> scheduler does not handle it today, neither does the scheduler itself 
> who punts responsibility to callers. So adding a test and making the 
> mock scheduler handle that properly would serve as an example on how 
> scheduler must be used. Or what can go bad if it isn't.

Do you mean having multiple (k)threads submitting against the same entity?
Would that be used to model a multithread application that uses multiple queues?

>> +	}
>> +
>> +	timeout_jiffies = usecs_to_jiffies(params->job_base_us * params->num_subs *
>> +					   params->num_jobs * 10);
> 
> The timeout calculation could use a comment. You are using num_subs * 10 
> to match the duratiot_us above being id * 10? With logic of calculating 
> a pessimistic timeout?
> 
> Have you tried it with qemu to check if it is pessimistic enough?

I'll double check on that.
 
>> +	for (i = 0; i < params->num_jobs; i++) {
>> +		done = drm_mock_sched_job_wait_finished(sub_data->jobs[i],
>> +							timeout_jiffies);
>> +		if (!done)
>> +			sub_data->timedout = true;
>> +	}
> 
> Technically you only need to wait on the last job but it is passable 
> like this too.
> 
> Also, is it important for the worker to wait for completion or the main 
> thread could simply wait for everything? Maybe that would simplify things.

I would say they serve different purposes. The completion is used to pause
all worker threads until they are all created to ensure they start submitting
jobs together to maximize concurrency.
 
> Manual timeline advance and this combined would mean the workers only 
> submit jobs, while the main thread simply does 
> drm_mock_sched_advance(sched, num_subs * num_jobs) and waits for last 
> job from each submitter to finish.
> 
> Again, auto-completion and timeout reporting is something I do not 
> immediate see is relevant for multi-threaded submission testing.
>
> Maybe if you want to test the mock scheduler itself it could be, but 
> then I would add it as separate entry in drm_sched_concurrent_cases[]. 
> Like maybe have a flag/boolean "auto-complete jobs". So one without and 
> one with that set.

I think it's a good idea and I'll experiment to see if it works.
 
> Other than that it looks tidy and was easy to follow. Only thing which 
> slightly got me was the term "subs" since I don't intuitively associate 
> it with a submitter but, well, a sub entity of some kind. Might be worth 
> renaming that to submitter(s), or even dropping the prefix in some cases 
> might be feasible (like sub(s)_data).

Agreed. I'll rename "subs" for better clarity.

> Regards,
> 
> Tvrtko
> 

Cheers,
Marco

next prev parent reply	other threads:[~2026-01-28 11:45 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-20 20:52 [RFC PATCH] drm/sched: Add new KUnit test suite for concurrent job submission Marco Pagani
2026-01-22  7:13 ` Philipp Stanner
2026-01-22  9:51 ` Tvrtko Ursulin
2026-01-28 11:39   ` Marco Pagani [this message]
2026-01-28 16:17     ` Philipp Stanner
2026-01-29 15:44     ` Tvrtko Ursulin
2026-02-04 16:33       ` Marco Pagani
2026-02-05  9:53         ` Tvrtko Ursulin
2026-02-06 10:36           ` Marco Pagani
2026-02-06 12:12             ` Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=90cdb121-7ff2-43a5-9327-2898b5e65609@linux.dev \
    --to=marco.pagani@linux.dev \
    --cc=airlied@gmail.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=marpagan@redhat.com \
    --cc=matthew.brost@intel.com \
    --cc=mripard@kernel.org \
    --cc=phasta@kernel.org \
    --cc=simona@ffwll.ch \
    --cc=tursulin@ursulin.net \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.