* [RFC 0/3] Simpler and more consistent handling of entities on missing hw blocks
@ 2026-01-07 12:43 Tvrtko Ursulin
2026-01-07 12:43 ` [RFC 1/3] drm/amdgpu: Reject impossible entities early Tvrtko Ursulin
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Tvrtko Ursulin @ 2026-01-07 12:43 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Matthew Brost,
Philipp Stanner, Tvrtko Ursulin
Trying to consolidate how entity creation and/or submission is rejected when
underlying hw ip block is missing.
Amdgpu appears to be the only user which can pass zero schedulers, or a single
NULL scheduler to drm_sched_entity_init(), so if I did not miss something subtle
and the first two patches are okay, then the last one removes handling for that
special case from the DRM scheduler.
Tvrtko Ursulin (3):
drm/amdgpu: Reject impossible entities early
drm/amdgpu: Remove redundant missing hw ip handling
drm/sched: Disallow initializing entities with no schedulers
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 -------
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 +++++
drivers/gpu/drm/scheduler/sched_entity.c | 13 ++++---------
3 files changed, 9 insertions(+), 16 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC 1/3] drm/amdgpu: Reject impossible entities early
2026-01-07 12:43 [RFC 0/3] Simpler and more consistent handling of entities on missing hw blocks Tvrtko Ursulin
@ 2026-01-07 12:43 ` Tvrtko Ursulin
2026-01-08 13:13 ` Christian König
2026-01-07 12:43 ` [RFC 2/3] drm/amdgpu: Remove redundant missing hw ip handling Tvrtko Ursulin
2026-01-07 12:43 ` [RFC 3/3] drm/sched: Disallow initializing entities with no schedulers Tvrtko Ursulin
2 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2026-01-07 12:43 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Matthew Brost,
Philipp Stanner, Tvrtko Ursulin
Currently there are two different behaviour modes when userspace tries to
operate on not present HW IP blocks. On a machine without UVD, VCE and VPE
blocks, this can be observed for example like this:
$ sudo ./amd_fuzzing --r cs-wait-fuzzing
...
amd_cs_wait_fuzzing DRM_IOCTL_AMDGPU_CTX r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_GFX r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_COMPUTE r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_DMA r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_UVD r -1
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCE r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_UVD_ENC r -1
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_DEC r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_ENC r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_JPEG r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VPE r 0
We can see that UVD returns an errno (-EINVAL) from the CS_WAIT ioctl,
while VCE and VPE return unexpected successes.
The difference stems from the fact the UVD is a load balancing engine
which retains the context, so with a workaround implemented in
amdgpu_ctx_init_entity(), but which does not account for the fact hardware
block may not be present.
This causes a single NULL scheduler to be passed to
drm_sched_entity_init(), which immediately rejects this with -EINVAL.
The not present VCE and VPE cases on the other hand pass zero schedulers
to drm_sched_entity_init(), which is explicitly allowed and results in
unusable entities.
As the UVD case however shows, call paths can handle the errors, so we can
consolidate this into a single path which will always return -EINVAL if
the HW IP block is not present.
We do this by rejecting it early and not calling drm_sched_entity_init()
when there is no backing hardware.
This also removes the need for the drm_sched_entity_init() to handle the
zero schedulers and NULL scheduler cases, which means that we can follow
up by removing the special casing from the DRM scheduler.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
References: f34e8bb7d6c6 ("drm/sched: fix null-ptr-deref in init entity")
Cc: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index afedea02188d..a5f85ea9fbb6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -239,6 +239,11 @@ static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, u32 hw_ip,
goto error_free_entity;
}
+ if (num_scheds == 0) {
+ r = -EINVAL;
+ goto error_free_entity;
+ }
+
/* disable load balance if the hw engine retains context among dependent jobs */
if (hw_ip == AMDGPU_HW_IP_VCN_ENC ||
hw_ip == AMDGPU_HW_IP_VCN_DEC ||
--
2.52.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC 2/3] drm/amdgpu: Remove redundant missing hw ip handling
2026-01-07 12:43 [RFC 0/3] Simpler and more consistent handling of entities on missing hw blocks Tvrtko Ursulin
2026-01-07 12:43 ` [RFC 1/3] drm/amdgpu: Reject impossible entities early Tvrtko Ursulin
@ 2026-01-07 12:43 ` Tvrtko Ursulin
2026-03-03 14:36 ` Christian König
2026-01-07 12:43 ` [RFC 3/3] drm/sched: Disallow initializing entities with no schedulers Tvrtko Ursulin
2 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2026-01-07 12:43 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Matthew Brost,
Philipp Stanner, Tvrtko Ursulin
Now that it is guaranteed there can be no entity if there is no hw ip
block we can remove the open coded protection during CS parsing.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
References: 55414ad5c983 ("drm/amdgpu: error out on entity with no run queue")
Cc: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index ecdfe6cb36cc..82bb70167f5a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -84,13 +84,6 @@ static int amdgpu_cs_job_idx(struct amdgpu_cs_parser *p,
if (r)
return r;
- /*
- * Abort if there is no run queue associated with this entity.
- * Possibly because of disabled HW IP.
- */
- if (entity->rq == NULL)
- return -EINVAL;
-
/* Check if we can add this IB to some existing job */
for (i = 0; i < p->gang_size; ++i)
if (p->entities[i] == entity)
--
2.52.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC 3/3] drm/sched: Disallow initializing entities with no schedulers
2026-01-07 12:43 [RFC 0/3] Simpler and more consistent handling of entities on missing hw blocks Tvrtko Ursulin
2026-01-07 12:43 ` [RFC 1/3] drm/amdgpu: Reject impossible entities early Tvrtko Ursulin
2026-01-07 12:43 ` [RFC 2/3] drm/amdgpu: Remove redundant missing hw ip handling Tvrtko Ursulin
@ 2026-01-07 12:43 ` Tvrtko Ursulin
2026-01-08 13:54 ` Philipp Stanner
2 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2026-01-07 12:43 UTC (permalink / raw)
To: amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Matthew Brost,
Philipp Stanner, Tvrtko Ursulin
Since we have removed the case where amdgpu was initializing entitites
with either no schedulers on the list, or with a single NULL scheduler,
and there appears no other drivers which rely on this, we can simplify the
scheduler by explictly rejecting that early.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
---
drivers/gpu/drm/scheduler/sched_entity.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index fe174a4857be..bb7e5fc47f99 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -61,32 +61,27 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
unsigned int num_sched_list,
atomic_t *guilty)
{
- if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
+ if (!entity || !sched_list || !num_sched_list || !sched_list[0])
return -EINVAL;
memset(entity, 0, sizeof(struct drm_sched_entity));
INIT_LIST_HEAD(&entity->list);
entity->rq = NULL;
entity->guilty = guilty;
- entity->num_sched_list = num_sched_list;
entity->priority = priority;
entity->last_user = current->group_leader;
- /*
- * It's perfectly valid to initialize an entity without having a valid
- * scheduler attached. It's just not valid to use the scheduler before it
- * is initialized itself.
- */
+ entity->num_sched_list = num_sched_list;
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
RCU_INIT_POINTER(entity->last_scheduled, NULL);
RB_CLEAR_NODE(&entity->rb_tree_node);
- if (num_sched_list && !sched_list[0]->sched_rq) {
+ if (!sched_list[0]->sched_rq) {
/* Since every entry covered by num_sched_list
* should be non-NULL and therefore we warn drivers
* not to do this and to fix their DRM calling order.
*/
pr_warn("%s: called with uninitialized scheduler\n", __func__);
- } else if (num_sched_list) {
+ } else {
/* The "priority" of an entity cannot exceed the number of run-queues of a
* scheduler. Protect against num_rqs being 0, by converting to signed. Choose
* the lowest priority available.
--
2.52.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC 1/3] drm/amdgpu: Reject impossible entities early
2026-01-07 12:43 ` [RFC 1/3] drm/amdgpu: Reject impossible entities early Tvrtko Ursulin
@ 2026-01-08 13:13 ` Christian König
0 siblings, 0 replies; 12+ messages in thread
From: Christian König @ 2026-01-08 13:13 UTC (permalink / raw)
To: Tvrtko Ursulin, amd-gfx, dri-devel
Cc: kernel-dev, Danilo Krummrich, Matthew Brost, Philipp Stanner
On 1/7/26 13:43, Tvrtko Ursulin wrote:
> Currently there are two different behaviour modes when userspace tries to
> operate on not present HW IP blocks. On a machine without UVD, VCE and VPE
> blocks, this can be observed for example like this:
>
> $ sudo ./amd_fuzzing --r cs-wait-fuzzing
> ...
> amd_cs_wait_fuzzing DRM_IOCTL_AMDGPU_CTX r 0
> amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_GFX r 0
> amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_COMPUTE r 0
> amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_DMA r 0
> amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_UVD r -1
> amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCE r 0
> amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_UVD_ENC r -1
> amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_DEC r 0
> amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_ENC r 0
> amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_JPEG r 0
> amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VPE r 0
>
> We can see that UVD returns an errno (-EINVAL) from the CS_WAIT ioctl,
> while VCE and VPE return unexpected successes.
>
> The difference stems from the fact the UVD is a load balancing engine
> which retains the context, so with a workaround implemented in
> amdgpu_ctx_init_entity(), but which does not account for the fact hardware
> block may not be present.
>
> This causes a single NULL scheduler to be passed to
> drm_sched_entity_init(), which immediately rejects this with -EINVAL.
>
> The not present VCE and VPE cases on the other hand pass zero schedulers
> to drm_sched_entity_init(), which is explicitly allowed and results in
> unusable entities.
>
> As the UVD case however shows, call paths can handle the errors, so we can
> consolidate this into a single path which will always return -EINVAL if
> the HW IP block is not present.
>
> We do this by rejecting it early and not calling drm_sched_entity_init()
> when there is no backing hardware.
>
> This also removes the need for the drm_sched_entity_init() to handle the
> zero schedulers and NULL scheduler cases, which means that we can follow
> up by removing the special casing from the DRM scheduler.
Yeah I remember that Luben looked into that issue at some point as well.
IIRC the solution back then looked similar to this here, but we had to revert that at some point because it caused issues.
Could be that those issues are now solved, but even if not applying the patches and fixing the fallout is probably the more reasonable thing to do.
So Reviewed-by: Christian König <christian.koenig@amd.com> for the entire series.
Regards,
Christian.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> References: f34e8bb7d6c6 ("drm/sched: fix null-ptr-deref in init entity")
> Cc: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index afedea02188d..a5f85ea9fbb6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -239,6 +239,11 @@ static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, u32 hw_ip,
> goto error_free_entity;
> }
>
> + if (num_scheds == 0) {
> + r = -EINVAL;
> + goto error_free_entity;
> + }
> +
> /* disable load balance if the hw engine retains context among dependent jobs */
> if (hw_ip == AMDGPU_HW_IP_VCN_ENC ||
> hw_ip == AMDGPU_HW_IP_VCN_DEC ||
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC 3/3] drm/sched: Disallow initializing entities with no schedulers
2026-01-07 12:43 ` [RFC 3/3] drm/sched: Disallow initializing entities with no schedulers Tvrtko Ursulin
@ 2026-01-08 13:54 ` Philipp Stanner
2026-01-12 10:29 ` Tvrtko Ursulin
0 siblings, 1 reply; 12+ messages in thread
From: Philipp Stanner @ 2026-01-08 13:54 UTC (permalink / raw)
To: Tvrtko Ursulin, amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Matthew Brost,
Philipp Stanner
What's the merge plan for this series? Christian?
On Wed, 2026-01-07 at 12:43 +0000, Tvrtko Ursulin wrote:
> Since we have removed the case where amdgpu was initializing entitites
> with either no schedulers on the list, or with a single NULL scheduler,
> and there appears no other drivers which rely on this, we can simplify the
> scheduler by explictly rejecting that early.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> ---
> drivers/gpu/drm/scheduler/sched_entity.c | 13 ++++---------
> 1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index fe174a4857be..bb7e5fc47f99 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -61,32 +61,27 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
> unsigned int num_sched_list,
> atomic_t *guilty)
> {
> - if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
> + if (!entity || !sched_list || !num_sched_list || !sched_list[0])
I personally am a fan of checking integers explicitly against a number,
which would make the diff a bit more straightforward, too. But I accept
that like that is common kernel practice.
> return -EINVAL;
>
> memset(entity, 0, sizeof(struct drm_sched_entity));
> INIT_LIST_HEAD(&entity->list);
> entity->rq = NULL;
> entity->guilty = guilty;
> - entity->num_sched_list = num_sched_list;
> entity->priority = priority;
> entity->last_user = current->group_leader;
> - /*
> - * It's perfectly valid to initialize an entity without having a valid
> - * scheduler attached. It's just not valid to use the scheduler before it
> - * is initialized itself.
> - */
> + entity->num_sched_list = num_sched_list;
Why do you move that line downwards? Just leave it where it was?
num_sched_list isn't changed or anything, so I don't see a logical
connection to the line below so that grouping would make sense.
With that:
Acked-by: Philipp Stanner <phasta@kernel.org>
P.
> entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
> RCU_INIT_POINTER(entity->last_scheduled, NULL);
> RB_CLEAR_NODE(&entity->rb_tree_node);
>
> - if (num_sched_list && !sched_list[0]->sched_rq) {
> + if (!sched_list[0]->sched_rq) {
> /* Since every entry covered by num_sched_list
> * should be non-NULL and therefore we warn drivers
> * not to do this and to fix their DRM calling order.
> */
> pr_warn("%s: called with uninitialized scheduler\n", __func__);
> - } else if (num_sched_list) {
> + } else {
> /* The "priority" of an entity cannot exceed the number of run-queues of a
> * scheduler. Protect against num_rqs being 0, by converting to signed. Choose
> * the lowest priority available.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC 3/3] drm/sched: Disallow initializing entities with no schedulers
2026-01-08 13:54 ` Philipp Stanner
@ 2026-01-12 10:29 ` Tvrtko Ursulin
2026-01-12 11:33 ` Danilo Krummrich
2026-01-12 12:49 ` Philipp Stanner
0 siblings, 2 replies; 12+ messages in thread
From: Tvrtko Ursulin @ 2026-01-12 10:29 UTC (permalink / raw)
To: phasta, amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Matthew Brost
On 08/01/2026 13:54, Philipp Stanner wrote:
> What's the merge plan for this series? Christian?
It sounds that staged merge would be safest. First two patches could go
to amd-next and if everything will look fine, I would follow up by
sending the DRM scheduler patch once amdgpu patches land to drm-next.
Or if DRM scheduler maintainers are happy for the DRM scheduler patch to
also go via amd-next that is another option.
> On Wed, 2026-01-07 at 12:43 +0000, Tvrtko Ursulin wrote:
>> Since we have removed the case where amdgpu was initializing entitites
>> with either no schedulers on the list, or with a single NULL scheduler,
>> and there appears no other drivers which rely on this, we can simplify the
>> scheduler by explictly rejecting that early.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Danilo Krummrich <dakr@kernel.org>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Philipp Stanner <phasta@kernel.org>
>> ---
>> drivers/gpu/drm/scheduler/sched_entity.c | 13 ++++---------
>> 1 file changed, 4 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>> index fe174a4857be..bb7e5fc47f99 100644
>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>> @@ -61,32 +61,27 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>> unsigned int num_sched_list,
>> atomic_t *guilty)
>> {
>> - if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
>> + if (!entity || !sched_list || !num_sched_list || !sched_list[0])
>
> I personally am a fan of checking integers explicitly against a number,
> which would make the diff a bit more straightforward, too. But I accept
> that like that is common kernel practice.
>
>> return -EINVAL;
>>
>> memset(entity, 0, sizeof(struct drm_sched_entity));
>> INIT_LIST_HEAD(&entity->list);
>> entity->rq = NULL;
>> entity->guilty = guilty;
>> - entity->num_sched_list = num_sched_list;
>> entity->priority = priority;
>> entity->last_user = current->group_leader;
>> - /*
>> - * It's perfectly valid to initialize an entity without having a valid
>> - * scheduler attached. It's just not valid to use the scheduler before it
>> - * is initialized itself.
>> - */
>> + entity->num_sched_list = num_sched_list;
>
> Why do you move that line downwards? Just leave it where it was?
> num_sched_list isn't changed or anything, so I don't see a logical
> connection to the line below so that grouping would make sense.
It looks completely logical to me to have both lines dealing with the
same scheduler list, accessing the same input parameter even, next to
each other:
entity->num_sched_list = num_sched_list;
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
No? In other words, I can respin if you insist but I don't see the need.
Regards,
Tvrtko
>
> With that:
> Acked-by: Philipp Stanner <phasta@kernel.org>
>
>
> P.
>
>> entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
>> RCU_INIT_POINTER(entity->last_scheduled, NULL);
>> RB_CLEAR_NODE(&entity->rb_tree_node);
>>
>> - if (num_sched_list && !sched_list[0]->sched_rq) {
>> + if (!sched_list[0]->sched_rq) {
>> /* Since every entry covered by num_sched_list
>> * should be non-NULL and therefore we warn drivers
>> * not to do this and to fix their DRM calling order.
>> */
>> pr_warn("%s: called with uninitialized scheduler\n", __func__);
>> - } else if (num_sched_list) {
>> + } else {
>> /* The "priority" of an entity cannot exceed the number of run-queues of a
>> * scheduler. Protect against num_rqs being 0, by converting to signed. Choose
>> * the lowest priority available.
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC 3/3] drm/sched: Disallow initializing entities with no schedulers
2026-01-12 10:29 ` Tvrtko Ursulin
@ 2026-01-12 11:33 ` Danilo Krummrich
2026-01-12 12:49 ` Philipp Stanner
1 sibling, 0 replies; 12+ messages in thread
From: Danilo Krummrich @ 2026-01-12 11:33 UTC (permalink / raw)
To: Tvrtko Ursulin
Cc: phasta, amd-gfx, dri-devel, kernel-dev, Christian König,
Matthew Brost
On Mon Jan 12, 2026 at 11:29 AM CET, Tvrtko Ursulin wrote:
> It looks completely logical to me to have both lines dealing with the
> same scheduler list, accessing the same input parameter even, next to
> each other:
>
> entity->num_sched_list = num_sched_list;
> entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
>
> No?
Agreed! Yet, please don't add unrelated changes to patches.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC 3/3] drm/sched: Disallow initializing entities with no schedulers
2026-01-12 10:29 ` Tvrtko Ursulin
2026-01-12 11:33 ` Danilo Krummrich
@ 2026-01-12 12:49 ` Philipp Stanner
2026-01-12 12:51 ` Christian König
1 sibling, 1 reply; 12+ messages in thread
From: Philipp Stanner @ 2026-01-12 12:49 UTC (permalink / raw)
To: Tvrtko Ursulin, phasta, amd-gfx, dri-devel
Cc: kernel-dev, Christian König, Danilo Krummrich, Matthew Brost
On Mon, 2026-01-12 at 10:29 +0000, Tvrtko Ursulin wrote:
>
> On 08/01/2026 13:54, Philipp Stanner wrote:
> > What's the merge plan for this series? Christian?
>
> It sounds that staged merge would be safest. First two patches could go
> to amd-next and if everything will look fine, I would follow up by
> sending the DRM scheduler patch once amdgpu patches land to drm-next.
Works for me.
>
> Or if DRM scheduler maintainers are happy for the DRM scheduler patch to
> also go via amd-next that is another option.
> > On Wed, 2026-01-07 at 12:43 +0000, Tvrtko Ursulin wrote:
> > > Since we have removed the case where amdgpu was initializing entitites
> > > with either no schedulers on the list, or with a single NULL scheduler,
> > > and there appears no other drivers which rely on this, we can simplify the
> > > scheduler by explictly rejecting that early.
> > >
> > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> > > Cc: Christian König <christian.koenig@amd.com>
> > > Cc: Danilo Krummrich <dakr@kernel.org>
> > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > Cc: Philipp Stanner <phasta@kernel.org>
> > > ---
> > > drivers/gpu/drm/scheduler/sched_entity.c | 13 ++++---------
> > > 1 file changed, 4 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > > index fe174a4857be..bb7e5fc47f99 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > > @@ -61,32 +61,27 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
> > > unsigned int num_sched_list,
> > > atomic_t *guilty)
> > > {
> > > - if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
> > > + if (!entity || !sched_list || !num_sched_list || !sched_list[0])
> >
> > I personally am a fan of checking integers explicitly against a number,
> > which would make the diff a bit more straightforward, too. But I accept
> > that like that is common kernel practice.
> >
> > > return -EINVAL;
> > >
> > > memset(entity, 0, sizeof(struct drm_sched_entity));
> > > INIT_LIST_HEAD(&entity->list);
> > > entity->rq = NULL;
> > > entity->guilty = guilty;
> > > - entity->num_sched_list = num_sched_list;
> > > entity->priority = priority;
> > > entity->last_user = current->group_leader;
> > > - /*
> > > - * It's perfectly valid to initialize an entity without having a valid
> > > - * scheduler attached. It's just not valid to use the scheduler before it
> > > - * is initialized itself.
> > > - */
> > > + entity->num_sched_list = num_sched_list;
> >
> > Why do you move that line downwards? Just leave it where it was?
> > num_sched_list isn't changed or anything, so I don't see a logical
> > connection to the line below so that grouping would make sense.
>
> It looks completely logical to me to have both lines dealing with the
> same scheduler list, accessing the same input parameter even, next to
> each other:
>
> entity->num_sched_list = num_sched_list;
> entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
>
> No? In other words, I can respin if you insist but I don't see the need.
Fine by me. Though a little sentence about that cosmetical change in
the commit message would have made that clearer.
Greetings
P.
>
> Regards,
>
> Tvrtko
>
> >
> > With that:
> > Acked-by: Philipp Stanner <phasta@kernel.org>
> >
> >
> > P.
> >
> > > entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
> > > RCU_INIT_POINTER(entity->last_scheduled, NULL);
> > > RB_CLEAR_NODE(&entity->rb_tree_node);
> > >
> > > - if (num_sched_list && !sched_list[0]->sched_rq) {
> > > + if (!sched_list[0]->sched_rq) {
> > > /* Since every entry covered by num_sched_list
> > > * should be non-NULL and therefore we warn drivers
> > > * not to do this and to fix their DRM calling order.
> > > */
> > > pr_warn("%s: called with uninitialized scheduler\n", __func__);
> > > - } else if (num_sched_list) {
> > > + } else {
> > > /* The "priority" of an entity cannot exceed the number of run-queues of a
> > > * scheduler. Protect against num_rqs being 0, by converting to signed. Choose
> > > * the lowest priority available.
> >
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC 3/3] drm/sched: Disallow initializing entities with no schedulers
2026-01-12 12:49 ` Philipp Stanner
@ 2026-01-12 12:51 ` Christian König
2026-01-22 8:52 ` Tvrtko Ursulin
0 siblings, 1 reply; 12+ messages in thread
From: Christian König @ 2026-01-12 12:51 UTC (permalink / raw)
To: phasta, Tvrtko Ursulin, amd-gfx, dri-devel
Cc: kernel-dev, Danilo Krummrich, Matthew Brost
On 1/12/26 13:49, Philipp Stanner wrote:
> On Mon, 2026-01-12 at 10:29 +0000, Tvrtko Ursulin wrote:
>>
>> On 08/01/2026 13:54, Philipp Stanner wrote:
>>> What's the merge plan for this series? Christian?
>>
>> It sounds that staged merge would be safest. First two patches could go
>> to amd-next and if everything will look fine, I would follow up by
>> sending the DRM scheduler patch once amdgpu patches land to drm-next.
>
> Works for me.
Sounds like a plan to me as well.
Regards,
Christian.
>
>>
>> Or if DRM scheduler maintainers are happy for the DRM scheduler patch to
>> also go via amd-next that is another option.
>> > On Wed, 2026-01-07 at 12:43 +0000, Tvrtko Ursulin wrote:
>>>> Since we have removed the case where amdgpu was initializing entitites
>>>> with either no schedulers on the list, or with a single NULL scheduler,
>>>> and there appears no other drivers which rely on this, we can simplify the
>>>> scheduler by explictly rejecting that early.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Cc: Danilo Krummrich <dakr@kernel.org>
>>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>>> Cc: Philipp Stanner <phasta@kernel.org>
>>>> ---
>>>> drivers/gpu/drm/scheduler/sched_entity.c | 13 ++++---------
>>>> 1 file changed, 4 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> index fe174a4857be..bb7e5fc47f99 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> @@ -61,32 +61,27 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>>>> unsigned int num_sched_list,
>>>> atomic_t *guilty)
>>>> {
>>>> - if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
>>>> + if (!entity || !sched_list || !num_sched_list || !sched_list[0])
>>>
>>> I personally am a fan of checking integers explicitly against a number,
>>> which would make the diff a bit more straightforward, too. But I accept
>>> that like that is common kernel practice.
>>>
>>>> return -EINVAL;
>>>>
>>>> memset(entity, 0, sizeof(struct drm_sched_entity));
>>>> INIT_LIST_HEAD(&entity->list);
>>>> entity->rq = NULL;
>>>> entity->guilty = guilty;
>>>> - entity->num_sched_list = num_sched_list;
>>>> entity->priority = priority;
>>>> entity->last_user = current->group_leader;
>>>> - /*
>>>> - * It's perfectly valid to initialize an entity without having a valid
>>>> - * scheduler attached. It's just not valid to use the scheduler before it
>>>> - * is initialized itself.
>>>> - */
>>>> + entity->num_sched_list = num_sched_list;
>>>
>>> Why do you move that line downwards? Just leave it where it was?
>>> num_sched_list isn't changed or anything, so I don't see a logical
>>> connection to the line below so that grouping would make sense.
>>
>> It looks completely logical to me to have both lines dealing with the
>> same scheduler list, accessing the same input parameter even, next to
>> each other:
>>
>> entity->num_sched_list = num_sched_list;
>> entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
>>
>> No? In other words, I can respin if you insist but I don't see the need.
>
> Fine by me. Though a little sentence about that cosmetical change in
> the commit message would have made that clearer.
>
>
> Greetings
> P.
>
>>
>> Regards,
>>
>> Tvrtko
>>
>>>
>>> With that:
>>> Acked-by: Philipp Stanner <phasta@kernel.org>
>>>
>>>
>>> P.
>>>
>>>> entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
>>>> RCU_INIT_POINTER(entity->last_scheduled, NULL);
>>>> RB_CLEAR_NODE(&entity->rb_tree_node);
>>>>
>>>> - if (num_sched_list && !sched_list[0]->sched_rq) {
>>>> + if (!sched_list[0]->sched_rq) {
>>>> /* Since every entry covered by num_sched_list
>>>> * should be non-NULL and therefore we warn drivers
>>>> * not to do this and to fix their DRM calling order.
>>>> */
>>>> pr_warn("%s: called with uninitialized scheduler\n", __func__);
>>>> - } else if (num_sched_list) {
>>>> + } else {
>>>> /* The "priority" of an entity cannot exceed the number of run-queues of a
>>>> * scheduler. Protect against num_rqs being 0, by converting to signed. Choose
>>>> * the lowest priority available.
>>>
>>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC 3/3] drm/sched: Disallow initializing entities with no schedulers
2026-01-12 12:51 ` Christian König
@ 2026-01-22 8:52 ` Tvrtko Ursulin
0 siblings, 0 replies; 12+ messages in thread
From: Tvrtko Ursulin @ 2026-01-22 8:52 UTC (permalink / raw)
To: Christian König, phasta, amd-gfx, dri-devel
Cc: kernel-dev, Danilo Krummrich, Matthew Brost
On 12/01/2026 12:51, Christian König wrote:
> On 1/12/26 13:49, Philipp Stanner wrote:
>> On Mon, 2026-01-12 at 10:29 +0000, Tvrtko Ursulin wrote:
>>>
>>> On 08/01/2026 13:54, Philipp Stanner wrote:
>>>> What's the merge plan for this series? Christian?
>>>
>>> It sounds that staged merge would be safest. First two patches could go
>>> to amd-next and if everything will look fine, I would follow up by
>>> sending the DRM scheduler patch once amdgpu patches land to drm-next.
>>
>> Works for me.
>
> Sounds like a plan to me as well.
Would you be able to test the first two patches with the AMD test suite?
Or put it into amd-staging-next if it would be automatically validated
there?
Or even all three patches.
On the overall I am keen to get a clear picture whether last patch will
be a go or no-go because my fair scheduler rewrite conflicts with it.
Regards,
Tvrtko
>>> Or if DRM scheduler maintainers are happy for the DRM scheduler patch to
>>> also go via amd-next that is another option.
>>> > On Wed, 2026-01-07 at 12:43 +0000, Tvrtko Ursulin wrote:
>>>>> Since we have removed the case where amdgpu was initializing entitites
>>>>> with either no schedulers on the list, or with a single NULL scheduler,
>>>>> and there appears no other drivers which rely on this, we can simplify the
>>>>> scheduler by explictly rejecting that early.
>>>>>
>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>> Cc: Danilo Krummrich <dakr@kernel.org>
>>>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>>>> Cc: Philipp Stanner <phasta@kernel.org>
>>>>> ---
>>>>> drivers/gpu/drm/scheduler/sched_entity.c | 13 ++++---------
>>>>> 1 file changed, 4 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> index fe174a4857be..bb7e5fc47f99 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> @@ -61,32 +61,27 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>>>>> unsigned int num_sched_list,
>>>>> atomic_t *guilty)
>>>>> {
>>>>> - if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
>>>>> + if (!entity || !sched_list || !num_sched_list || !sched_list[0])
>>>>
>>>> I personally am a fan of checking integers explicitly against a number,
>>>> which would make the diff a bit more straightforward, too. But I accept
>>>> that like that is common kernel practice.
>>>>
>>>>> return -EINVAL;
>>>>>
>>>>> memset(entity, 0, sizeof(struct drm_sched_entity));
>>>>> INIT_LIST_HEAD(&entity->list);
>>>>> entity->rq = NULL;
>>>>> entity->guilty = guilty;
>>>>> - entity->num_sched_list = num_sched_list;
>>>>> entity->priority = priority;
>>>>> entity->last_user = current->group_leader;
>>>>> - /*
>>>>> - * It's perfectly valid to initialize an entity without having a valid
>>>>> - * scheduler attached. It's just not valid to use the scheduler before it
>>>>> - * is initialized itself.
>>>>> - */
>>>>> + entity->num_sched_list = num_sched_list;
>>>>
>>>> Why do you move that line downwards? Just leave it where it was?
>>>> num_sched_list isn't changed or anything, so I don't see a logical
>>>> connection to the line below so that grouping would make sense.
>>>
>>> It looks completely logical to me to have both lines dealing with the
>>> same scheduler list, accessing the same input parameter even, next to
>>> each other:
>>>
>>> entity->num_sched_list = num_sched_list;
>>> entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
>>>
>>> No? In other words, I can respin if you insist but I don't see the need.
>>
>> Fine by me. Though a little sentence about that cosmetical change in
>> the commit message would have made that clearer.
>>
>>
>> Greetings
>> P.
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>>
>>>> With that:
>>>> Acked-by: Philipp Stanner <phasta@kernel.org>
>>>>
>>>>
>>>> P.
>>>>
>>>>> entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
>>>>> RCU_INIT_POINTER(entity->last_scheduled, NULL);
>>>>> RB_CLEAR_NODE(&entity->rb_tree_node);
>>>>>
>>>>> - if (num_sched_list && !sched_list[0]->sched_rq) {
>>>>> + if (!sched_list[0]->sched_rq) {
>>>>> /* Since every entry covered by num_sched_list
>>>>> * should be non-NULL and therefore we warn drivers
>>>>> * not to do this and to fix their DRM calling order.
>>>>> */
>>>>> pr_warn("%s: called with uninitialized scheduler\n", __func__);
>>>>> - } else if (num_sched_list) {
>>>>> + } else {
>>>>> /* The "priority" of an entity cannot exceed the number of run-queues of a
>>>>> * scheduler. Protect against num_rqs being 0, by converting to signed. Choose
>>>>> * the lowest priority available.
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC 2/3] drm/amdgpu: Remove redundant missing hw ip handling
2026-01-07 12:43 ` [RFC 2/3] drm/amdgpu: Remove redundant missing hw ip handling Tvrtko Ursulin
@ 2026-03-03 14:36 ` Christian König
0 siblings, 0 replies; 12+ messages in thread
From: Christian König @ 2026-03-03 14:36 UTC (permalink / raw)
To: Tvrtko Ursulin, amd-gfx, dri-devel
Cc: kernel-dev, Danilo Krummrich, Matthew Brost, Philipp Stanner
On 1/7/26 13:43, Tvrtko Ursulin wrote:
> Now that it is guaranteed there can be no entity if there is no hw ip
> block we can remove the open coded protection during CS parsing.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> References: 55414ad5c983 ("drm/amdgpu: error out on entity with no run queue")
> Cc: Christian König <christian.koenig@amd.com>
I've gone ahead, reviewed this one here as well and pushed the first two patches into our internal amd-staging-drm-next branch.
It should appear public in the next few days if our CI system is happy with it.
Thanks,
Christian.
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 -------
> 1 file changed, 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index ecdfe6cb36cc..82bb70167f5a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -84,13 +84,6 @@ static int amdgpu_cs_job_idx(struct amdgpu_cs_parser *p,
> if (r)
> return r;
>
> - /*
> - * Abort if there is no run queue associated with this entity.
> - * Possibly because of disabled HW IP.
> - */
> - if (entity->rq == NULL)
> - return -EINVAL;
> -
> /* Check if we can add this IB to some existing job */
> for (i = 0; i < p->gang_size; ++i)
> if (p->entities[i] == entity)
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-03-03 14:36 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-07 12:43 [RFC 0/3] Simpler and more consistent handling of entities on missing hw blocks Tvrtko Ursulin
2026-01-07 12:43 ` [RFC 1/3] drm/amdgpu: Reject impossible entities early Tvrtko Ursulin
2026-01-08 13:13 ` Christian König
2026-01-07 12:43 ` [RFC 2/3] drm/amdgpu: Remove redundant missing hw ip handling Tvrtko Ursulin
2026-03-03 14:36 ` Christian König
2026-01-07 12:43 ` [RFC 3/3] drm/sched: Disallow initializing entities with no schedulers Tvrtko Ursulin
2026-01-08 13:54 ` Philipp Stanner
2026-01-12 10:29 ` Tvrtko Ursulin
2026-01-12 11:33 ` Danilo Krummrich
2026-01-12 12:49 ` Philipp Stanner
2026-01-12 12:51 ` Christian König
2026-01-22 8:52 ` Tvrtko Ursulin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.