From: Matthew Brost <matthew.brost@intel.com>
To: <phasta@kernel.org>
Cc: "Boris Brezillon" <boris.brezillon@collabora.com>,
"Chia-I Wu" <olvaffe@gmail.com>,
"ML dri-devel" <dri-devel@lists.freedesktop.org>,
intel-xe@lists.freedesktop.org,
"Steven Price" <steven.price@arm.com>,
"Liviu Dudau" <liviu.dudau@arm.com>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Danilo Krummrich" <dakr@kernel.org>,
"Christian König" <ckoenig.leichtzumerken@gmail.com>,
"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
"open list" <linux-kernel@vger.kernel.org>,
tj@kernel.org
Subject: Re: drm_sched run_job and scheduling latency
Date: Thu, 5 Mar 2026 01:10:06 -0800 [thread overview]
Message-ID: <aalIbgi71svPQs3Z@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <fa4a9c55792b0e79d94faa82085b693aa7feb989.camel@mailbox.org>
On Thu, Mar 05, 2026 at 09:38:16AM +0100, Philipp Stanner wrote:
> On Thu, 2026-03-05 at 09:27 +0100, Boris Brezillon wrote:
> > Hi Matthew,
> >
> > On Wed, 4 Mar 2026 18:04:25 -0800
> > Matthew Brost <matthew.brost@intel.com> wrote:
> >
> > > On Wed, Mar 04, 2026 at 02:51:39PM -0800, Chia-I Wu wrote:
> > > > Hi,
> > > >
> > > > Our system compositor (surfaceflinger on android) submits gpu jobs
> > > > from a SCHED_FIFO thread to an RT gpu queue. However, because
> > > > workqueue threads are SCHED_NORMAL, the scheduling latency from submit
> > > > to run_job can sometimes cause frame misses. We are seeing this on
> > > > panthor and xe, but the issue should be common to all drm_sched users.
> > > >
> > >
> > > I'm going to assume that since this is a compositor, you do not pass
> > > input dependencies to the page-flip job. Is that correct?
> > >
> > > If so, I believe we could fairly easily build an opt-in DRM sched path
> > > that directly calls run_job in the exec IOCTL context (I assume this is
> > > SCHED_FIFO) if the job has no dependencies.
> >
> > I guess by ::run_job() you mean something slightly more involved that
> > checks if:
> >
> > - other jobs are pending
Yes.
> > - enough credits (AKA ringbuf space) is available
Yes.
> > - and probably other stuff I forgot about
The scheduler is not stopped; serialize the bypass path with scheduler
stop/start.
> >
> > >
> > > This would likely break some of Xe’s submission-backend assumptions
> > > around mutual exclusion and ordering based on the workqueue, but that
> > > seems workable. I don’t know how the Panthor code is structured or
> > > whether they have similar issues.
> >
> > Honestly, I'm not thrilled by this fast-path/call-run_job-directly idea
> > you're describing. There's just so many things we can forget that would
> > lead to races/ordering issues that will end up being hard to trigger and
> > debug.
> >
>
> +1
>
> I'm not thrilled either. More like the opposite of thrilled actually.
>
> Even if we could get that to work. This is more of a maintainability
> issue.
>
> The scheduler is full of insane performance hacks for this or that
> driver. Lockless accesses, a special lockless queue only used by that
> one party in the kernel (a lockless queue which is nowadays, after N
> reworks, being used with a lock. Ah well).
>
This is not relevant to this discussion—see below. In general, I agree
that the lockless tricks in the scheduler are not great, nor is the fact
that the scheduler became a dumping ground for driver-specific features.
But again, that is not what we’re talking about here—see below.
> In the past discussions Danilo and I made it clear that more major
> features in _new_ patch series aimed at getting merged into drm/sched
> must be preceded by cleanup work to address some of the scheduler's
> major problems.
Ah, we've moved to dictatorship quickly. Noted.
>
I can't say I agree with either of you here.
In about an hour, I seemingly have a bypass path working in DRM sched +
Xe, and my diff is:
108 insertions(+), 31 deletions(-)
About 40 lines of the insertions are kernel-doc, so I'm not buying that
this is a maintenance issue or a major feature - it is literally a
single new function.
I understand a bypass path can create issues—for example, on certain
queues in Xe I definitely can't use the bypass path, so Xe simply
wouldn’t use it in those cases. This is the driver's choice to use or
not. If a driver doesn't know how to use the scheduler, well, that’s on
the driver. Providing a simple, documented function as a fast path
really isn't some crazy idea.
The alternative—asking for RT workqueues or changing the design to use
kthread_worker—actually is.
> That's especially true if it's features aimed at performance buffs.
>
With the above mindset, I'm actually very confused why this series [1]
would even be considered as this order of magnitude greater in
complexity than my suggestion here.
Matt
[1] https://patchwork.freedesktop.org/series/159025/
>
>
> P.
next prev parent reply other threads:[~2026-03-05 9:10 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-04 22:51 drm_sched run_job and scheduling latency Chia-I Wu
2026-03-05 2:04 ` Matthew Brost
2026-03-05 8:27 ` Boris Brezillon
2026-03-05 8:38 ` Philipp Stanner
2026-03-05 9:10 ` Matthew Brost [this message]
2026-03-05 9:47 ` Philipp Stanner
2026-03-16 4:05 ` Matthew Brost
2026-03-16 4:14 ` Matthew Brost
2026-03-05 10:19 ` Boris Brezillon
2026-03-05 12:27 ` Danilo Krummrich
2026-03-05 10:09 ` Matthew Brost
2026-03-05 10:52 ` Boris Brezillon
2026-03-05 20:51 ` Matthew Brost
2026-03-06 5:13 ` Chia-I Wu
2026-03-06 7:21 ` Matthew Brost
2026-03-06 9:36 ` Michel Dänzer
2026-03-06 9:40 ` Michel Dänzer
2026-03-05 8:35 ` Tvrtko Ursulin
2026-03-05 9:40 ` Boris Brezillon
2026-03-27 9:19 ` Tvrtko Ursulin
2026-03-05 9:23 ` Boris Brezillon
2026-03-06 5:33 ` Chia-I Wu
2026-03-06 7:36 ` Matthew Brost
2026-03-05 23:09 ` Hillf Danton
2026-03-06 5:46 ` Chia-I Wu
2026-03-06 11:58 ` Hillf Danton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aalIbgi71svPQs3Z@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=airlied@gmail.com \
--cc=boris.brezillon@collabora.com \
--cc=ckoenig.leichtzumerken@gmail.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=liviu.dudau@arm.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=mripard@kernel.org \
--cc=olvaffe@gmail.com \
--cc=phasta@kernel.org \
--cc=rodrigo.vivi@intel.com \
--cc=simona@ffwll.ch \
--cc=steven.price@arm.com \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tj@kernel.org \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox