dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Philipp Stanner <phasta@mailbox.org>
To: "Pierre-Eric Pelloux-Prayer" <pierre-eric@damsy.net>,
	phasta@kernel.org,
	"Pierre-Eric Pelloux-Prayer" <pierre-eric.pelloux-prayer@amd.com>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 2/2] drm/sched: limit sched score update to jobs change
Date: Tue, 02 Sep 2025 08:21:25 +0200	[thread overview]
Message-ID: <d2cda90644e717a781b98a90b85b1464cbd0b11e.camel@mailbox.org> (raw)
In-Reply-To: <b1a8f459-93dd-4b6c-b29e-c68fba19f6fc@damsy.net>

On Mon, 2025-09-01 at 15:14 +0200, Pierre-Eric Pelloux-Prayer wrote:
> 
> 
> Le 25/08/2025 à 15:13, Philipp Stanner a écrit :
> > On Fri, 2025-08-22 at 15:43 +0200, Pierre-Eric Pelloux-Prayer wrote:
> > > Currently, the scheduler score is incremented when a job is pushed to an
> > > entity and when an entity is attached to the scheduler.
> > 
> > It's indeed awkward why attaching is treated equivalently to job
> > submission.
> > 
> > Can you expand the documentation for drm_sched_init_args a bit so that
> > it gets clearer what the score is supposed to do?
> 
> 
> drm_sched_init_args.score is the feature allowing multiple schedulers to share a 
> score so I suppose you meant drm_gpu_scheduler.score?

I mean both.

struct drm_sched_init_args has a @score which is passed by the driver
during init and will then be stored in drm_gpu_scheduler.score

The docu should be improved for both.

> 
> The doc currently says "score to help loadbalancer pick a idle sched" which is a 
> bit vague. It could be modified to become:
> 
>    @score: holds the number of yet-to-be-completed jobs pushed to each scheduler.
>            It's used when load balancing between different schedulers.

"to each scheduler" reads a bit as if each scheduler has an individual
score. Is it used that way, though? I think it's a pointer because all
schedulers share this atomic. Thus, it "holds the total number of yet-
to-be-completed jobs for all schedulers sharing this atomic", right?

> 
> What do you think?
> 
> > 
> > > 
> > > This leads to some bad scheduling decision where the score value is
> > > largely made of idle entities.
> > > 
> > > For instance, a scenario with 2 schedulers and where 10 entities submit
> > > a single job, then do nothing, each scheduler will probably end up with
> > > a score of 5.
> > > Now, 5 userspace apps exit, so their entities will be dropped.
> > > 
> > 
> > "entities will be dropped" == "drm_sched_entity_kill() gets called",
> > right?
> 
> Yes.

then nit: better say "their entities will be killed" instead of
dropped.

> 
> > > In
> > > the worst case, these apps' entities where all attached to the same
> > 
> > s/where/were
> > 
> > or better yet: "could be"
> 
> Will fix, thanks.
> 
> > 
> > > scheduler and we end up with score=5 (the 5 remaining entities) and
> > > score=0, despite the 2 schedulers being idle.
> > 
> > Sounds indeed like a (small) problem to me.
> > 
> > 
> > > When new entities show up, they will all select the second scheduler
> > > based on its low score value, instead of alternating between the 2.
> > > 
> > > Some amdgpu rings depended on this feature, but the previous commit
> > > implemented the same thing in amdgpu directly so it can be safely
> > > removed from drm/sched.
> > 
> > Can we be that sure that other drivers don't depend on it, though? I
> > suspect it's likely that it's just amdgpu, but…
> > 
> 
> Aside from the new "rocket" as pointed out by Tvrtko, amdgpu is the only driver 
> passing more than one schedulers to entities so they're the only ones that could 
> be affected.
> 
> I verified amdgpu and Tvrtko pinged the rocket maintainers in the other thread.

Very good! Then let's give those guys a few days to jump into the
discussion.

> 
> > 
> > 
> > BTW, since you're cleaning up related stuff currently: I saw that it
> > seems that the only driver that sets &struct drm_sched_init_args.score
> > is amdgpu. Would be cool if you can take a look whether that's still
> > needed.
> 
> It cannot really be removed yet as it's useful when a single hardware block is 
> exposed through different schedulers (so pushing jobs to one of the schedulers 
> should increase the load of the underlying hw).

OK.


Thx
P.

> 
> Thanks,
> Pierre-Eric
> 
> > 
> > 
> > P.
> > 
> > > 
> > > Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> > > ---
> > >   drivers/gpu/drm/scheduler/sched_main.c | 2 --
> > >   1 file changed, 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > > index 5a550fd76bf0..e6d232a8ec58 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -206,7 +206,6 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
> > >   	if (!list_empty(&entity->list))
> > >   		return;
> > >   
> > > -	atomic_inc(rq->sched->score);
> > >   	list_add_tail(&entity->list, &rq->entities);
> > >   }
> > >   
> > > @@ -228,7 +227,6 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
> > >   
> > >   	spin_lock(&rq->lock);
> > >   
> > > -	atomic_dec(rq->sched->score);
> > >   	list_del_init(&entity->list);
> > >   
> > >   	if (rq->current_entity == entity)


  reply	other threads:[~2025-09-02  6:21 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-22 13:43 [PATCH v1 1/2] drm/amdgpu: increment share sched score on entity selection Pierre-Eric Pelloux-Prayer
2025-08-22 13:43 ` [PATCH v1 2/2] drm/sched: limit sched score update to jobs change Pierre-Eric Pelloux-Prayer
2025-08-25 13:13   ` Philipp Stanner
2025-09-01 13:14     ` Pierre-Eric Pelloux-Prayer
2025-09-02  6:21       ` Philipp Stanner [this message]
2025-09-01  9:20   ` Tvrtko Ursulin
2025-09-05 13:39     ` Tomeu Vizoso
2025-09-01  9:02 ` [PATCH v1 1/2] drm/amdgpu: increment share sched score on entity selection Tvrtko Ursulin
2025-09-03  8:48   ` Pierre-Eric Pelloux-Prayer
2025-09-05 14:33 ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d2cda90644e717a781b98a90b85b1464cbd0b11e.camel@mailbox.org \
    --to=phasta@mailbox.org \
    --cc=airlied@gmail.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mripard@kernel.org \
    --cc=phasta@kernel.org \
    --cc=pierre-eric.pelloux-prayer@amd.com \
    --cc=pierre-eric@damsy.net \
    --cc=simona@ffwll.ch \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).