Re: [PATCH v5 07/28] drm/sched: Account entity GPU time

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Danilo Krummrich" <dakr@kernel.org>
To: "Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>
Cc: amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	kernel-dev@igalia.com,
	"Christian König" <christian.koenig@amd.com>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Philipp Stanner" <phasta@kernel.org>
Subject: Re: [PATCH v5 07/28] drm/sched: Account entity GPU time
Date: Thu, 15 Jan 2026 14:18:52 +0100	[thread overview]
Message-ID: <DFP6Y7AJ8ET5.259R75H20S4LA@kernel.org> (raw)
In-Reply-To: <a9771f12-7d4b-4441-ab01-51fca86f53bd@igalia.com>

On Thu Jan 15, 2026 at 1:52 PM CET, Tvrtko Ursulin wrote:
> On 15/01/2026 12:06, Danilo Krummrich wrote:
>> On Thu Jan 15, 2026 at 9:56 AM CET, Tvrtko Ursulin wrote:
>>> On 14/01/2026 17:48, Danilo Krummrich wrote:
>>>> On Fri Dec 19, 2025 at 2:53 PM CET, Tvrtko Ursulin wrote:
>>>>> +/**
>>>>> + * drm_sched_entity_stats_job_add_gpu_time - Account job execution time to entity
>>>>> + * @job: Scheduler job to account.
>>>>> + *
>>>>> + * Accounts the execution time of @job to its respective entity stats object.
>>>>> + */
>>>>> +static inline void
>>>>> +drm_sched_entity_stats_job_add_gpu_time(struct drm_sched_job *job)
>>>>> +{
>>>>> +	struct drm_sched_entity_stats *stats = job->entity_stats;
>>>>> +	struct drm_sched_fence *s_fence = job->s_fence;
>>>>> +	ktime_t start, end;
>>>>> +
>>>>> +	start = dma_fence_timestamp(&s_fence->scheduled);
>>>>> +	end = dma_fence_timestamp(&s_fence->finished);
>>>>> +
>>>>> +	spin_lock(&stats->lock);
>>>>> +	stats->runtime = ktime_add(stats->runtime, ktime_sub(end, start));
>>>>> +	spin_unlock(&stats->lock);
>>>>> +}
>>>>
>>>> This shouldn't be an inline function in the header, please move to
>>>> sched_entity.c.
>>>
>>> It is not super pretty for a static inline but it was a pragmatic choice
>>> because it doesn't really belong to sched_entity.c. The whole entity
>>> stats object that is. Jobs and entities have only an association
>>> relationship to struct drm_sched_entity_stats. The only caller for this
>>> is even in sched_main.c while other updates are done in and from sched_rq.c.
>> 
>> But you put drm_sched_entity_stats_release() and drm_sched_entity_stats_alloc()
>> into sched_entity.c as well, I don't see how that is different.
>
> Indeed I have. Must have had a different reason back when I wrote it. I 
> will move it.

Just to be clear, please leave those in sched_entity.c and move
drm_sched_entity_stats_job_add_gpu_time() in there as well.

>> Besides, the struct is called struct drm_sched_entity_stats, i.e. stats of an
>> entity. The documentation says "execution stats for an entity", so it clearly
>> belongs to entites, no?
>> 
>>> So if pragmatic approach is not acceptable I would even rather create a
>>> new file along the lines of sched_entity_stats.h|c. Unless that turns
>>> out would have some other design wart of leaking knowledge of some other
>>> part of the scheduler (ie wouldn't be fully standalone).
>> 
>> Given the above, please just move this function into sched_entity.c.
>> 
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index f825ad9e2260..4c10c7ba6704 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -660,6 +660,7 @@ void drm_sched_job_arm(struct drm_sched_job *job)
>>>>>    
>>>>>    	job->sched = sched;
>>>>>    	job->s_priority = entity->priority;
>>>>> +	job->entity_stats = drm_sched_entity_stats_get(entity->stats);
>>>>>    
>>>>>    	drm_sched_fence_init(job->s_fence, job->entity);
>>>>>    }
>>>>> @@ -849,6 +850,7 @@ void drm_sched_job_cleanup(struct drm_sched_job *job)
>>>>>    		 * been called.
>>>>>    		 */
>>>>>    		dma_fence_put(&job->s_fence->finished);
>>>>> +		drm_sched_entity_stats_put(job->entity_stats);
>>>>>    	} else {
>>>>>    		/* The job was aborted before it has been committed to be run;
>>>>>    		 * notably, drm_sched_job_arm() has not been called.
>>>>> @@ -1000,8 +1002,10 @@ static void drm_sched_free_job_work(struct work_struct *w)
>>>>>    		container_of(w, struct drm_gpu_scheduler, work_free_job);
>>>>>    	struct drm_sched_job *job;
>>>>>    
>>>>> -	while ((job = drm_sched_get_finished_job(sched)))
>>>>> +	while ((job = drm_sched_get_finished_job(sched))) {
>>>>> +		drm_sched_entity_stats_job_add_gpu_time(job);
>>>>
>>>> Is it really always OK to update this value in the free job work? What if a new
>>>> job gets scheduled concurrently. Doesn't this hurt accuracy, since the entity
>>>> value has not been updated yet?
>>>
>>> What exactly you mean by entity value?
>>>
>>> If a new job gets scheduled concurrently then it is either just about to
>>> run, still running, both of which are not relevant for this finished
>>> job, and once finished will also end up here to have it's duration
>>> accounted against the stats.
>> 
>> So, what I mean is that the timeframe between a running job's fence being
>> signaled due to completion and the this same job is being freed in the free job
>> work by the driver can be pretty big.
>> 
>> In the meantime the scheduler might have to take multiple decisions on which
>> entity is next to be scheduled. And by calling
>> drm_sched_entity_stats_job_add_gpu_time() in drm_sched_job_cleanup() rather than
>> when it's finished fence is signaled we give up on accuracy in terms of
>> fairness, while fairness is the whole purpose of this scheduling approach.
>
> Right, so yes, the entity runtime lags the actual situation by the delay 
> between scheduling and running the free worker.
>
> TBH now the problem is I wrote this so long ago that I don't even 
> remember what was the reason I moved this from the job done callback to 
> the finished worker. Digging through my branches it happened during 
> April '25. I will try to remind myself while I am making other tweaks.
>
> But in principle, I am not too concerned with this. In practice this 
> delay isn't really measurable and for actual fairness much, much, bigger 
> issue is the general lack of preemption in many drivers, coupled with 
> submitting more than one job per entity at a time.

I might be that in your test environment the impact is not that big, but it
depends on CPU scheduler load and also highly depends the configuration of the
submit_wq.

So, unless there is a good reason, I don't see why we would not avoid this
latency.

next prev parent reply	other threads:[~2026-01-15 13:18 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-19 13:53 [PATCH v5 00/28] Fair(er) DRM scheduler Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 01/28] drm/sched: Consolidate entity run queue management Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 02/28] drm/sched: Move run queue related code into a separate file Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 03/28] drm/sched: Add some scheduling quality unit tests Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 04/28] drm/sched: Add some more " Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 05/28] drm/sched: Implement RR via FIFO Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 06/28] drm/sched: Free all finished jobs at once Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 07/28] drm/sched: Account entity GPU time Tvrtko Ursulin
2026-01-14 17:48   ` Danilo Krummrich
2026-01-15  8:56     ` Tvrtko Ursulin
2026-01-15 12:06       ` Danilo Krummrich
2026-01-15 12:52         ` Tvrtko Ursulin
2026-01-15 13:18           ` Danilo Krummrich [this message]
2026-01-20 10:31             ` Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 08/28] drm/sched: Remove idle entity from tree Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 09/28] drm/sched: Add fair scheduling policy Tvrtko Ursulin
2026-01-14 22:13   ` Danilo Krummrich
2026-01-15  8:28     ` Tvrtko Ursulin
2026-01-15 11:44       ` Danilo Krummrich
2026-01-15 13:00         ` Tvrtko Ursulin
2026-01-15 23:39           ` Danilo Krummrich
2026-01-20  9:51             ` Tvrtko Ursulin
2026-03-07  1:17               ` Hillf Danton
2025-12-19 13:53 ` [PATCH v5 10/28] drm/sched: Favour interactive clients slightly Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 11/28] drm/sched: Switch default policy to fair Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 12/28] drm/sched: Remove FIFO and RR and simplify to a single run queue Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 13/28] drm/sched: Embed run queue singleton into the scheduler Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 14/28] accel/amdxdna: Remove drm_sched_init_args->num_rqs usage Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 15/28] accel/rocket: " Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 16/28] accel/ethosu: " Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 17/28] drm/amdgpu: " Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 18/28] drm/etnaviv: " Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 19/28] drm/imagination: " Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 20/28] drm/lima: " Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 21/28] drm/msm: " Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 22/28] drm/nouveau: " Tvrtko Ursulin
2025-12-19 13:53   ` Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 23/28] drm/panfrost: " Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 24/28] drm/panthor: " Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 25/28] drm/sched: " Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 26/28] drm/v3d: " Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 27/28] drm/xe: " Tvrtko Ursulin
2025-12-19 13:53 ` [PATCH v5 28/28] drm/sched: Remove drm_sched_init_args->num_rqs Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DFP6Y7AJ8ET5.259R75H20S4LA@kernel.org \
    --to=dakr@kernel.org \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=kernel-dev@igalia.com \
    --cc=matthew.brost@intel.com \
    --cc=phasta@kernel.org \
    --cc=tvrtko.ursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.