All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Chia-I Wu <olvaffe@gmail.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>,
	Steven Price <steven.price@arm.com>,
	Liviu Dudau <liviu.dudau@arm.com>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in IRQ context
Date: Wed, 20 May 2026 10:09:46 +0200	[thread overview]
Message-ID: <20260520100946.398c5282@fedora> (raw)
In-Reply-To: <CAPaKu7ToQmHHhTyb-az_b7EG2iRpc23=AJOYQ5P=HgrG29bS1A@mail.gmail.com>

On Tue, 19 May 2026 14:04:47 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> On Tue, May 19, 2026 at 1:45 PM Chia-I Wu <olvaffe@gmail.com> wrote:
> >
> > On Tue, May 19, 2026 at 11:26 AM Boris Brezillon
> > <boris.brezillon@collabora.com> wrote:  
> > >
> > > On Tue, 19 May 2026 10:16:26 -0700
> > > Chia-I Wu <olvaffe@gmail.com> wrote:
> > >  
> > > > On Tue, May 19, 2026 at 12:53 AM Boris Brezillon
> > > > <boris.brezillon@collabora.com> wrote:  
> > > > >
> > > > > On Mon, 18 May 2026 16:33:20 -0700
> > > > > Chia-I Wu <olvaffe@gmail.com> wrote:
> > > > >
> > > > >  
> > > > > > > >
> > > > > > > >  
> > > > > > > > >  
> > > > > > > > > >  
> > > > > > > > > > >         if (!ptdev->scheduler)
> > > > > > > > > > >                 return;
> > > > > > > > > > >
> > > > > > > > > > > -       atomic_or(events, &ptdev->scheduler->fw_events);
> > > > > > > > > > > -       sched_queue_work(ptdev->scheduler, fw_events);
> > > > > > > > > > > +       guard(spinlock_irqsave)(&ptdev->scheduler->events_lock);
> > > > > > > > > > > +
> > > > > > > > > > > +       if (events & JOB_INT_GLOBAL_IF) {
> > > > > > > > > > > +               sched_process_global_irq_locked(ptdev);
> > > > > > > > > > > +               events &= ~JOB_INT_GLOBAL_IF;
> > > > > > > > > > > +       }
> > > > > > > > > > > +
> > > > > > > > > > > +       while (events) {
> > > > > > > > > > > +               u32 csg_id = ffs(events) - 1;
> > > > > > > > > > > +
> > > > > > > > > > > +               sched_process_csg_irq_locked(ptdev, csg_id);
> > > > > > > > > > > +               events &= ~BIT(csg_id);
> > > > > > > > > > > +       }  
> > > > > > > > > > This handles all fw events in the irq context. Are there concerns that
> > > > > > > > > > it may take too long? I might be wrong, but it seems possible to
> > > > > > > > > > handle only CSG_SYNC_UPDATE and defer the rest as before.  
> > > > > > > > >
> > > > > > > > > I started with just the SYNC_UPDATE processing done in the hard-irq
> > > > > > > > > context, but after auditing the other stuff done in the handler, I
> > > > > > > > > realized it's basically just deferring all actual processing to work
> > > > > > > > > items. Yes, there's the overhead of demuxing the events from the
> > > > > > > > > ack/req regs, but part of this is already done to get to SYNC_UPDATE
> > > > > > > > > anyway, so at this point we're probably better off demuxing everything
> > > > > > > > > and scheduling works for all kind of events.
> > > > > > > > >
> > > > > > > > > I also compared the perfs between the two approaches (though I didn't
> > > > > > > > > do as much testing as I did with the new version, so I might have
> > > > > > > > > missed something), and it didn't seem to matter at all, because the
> > > > > > > > > interrupts we receive the most are SYNC_UPDATE and IDLE events, and
> > > > > > > > > those are at the same level.  
> > > > > > > > Looking at ftrace irq events, when there is one active csg,
> > > > > > > > panthor-job takes 6us (median) / 17us (95%) / 27us (slowest).
> > > > > > > >
> > > > > > > > I don't have a good sense if that's considered normal in hardirq. But
> > > > > > > > if that is ever an issue, and if the majority of the time is spent in
> > > > > > > > CSG_SYNC_UPDATE anyway, we can always revert the last patch to move
> > > > > > > > processing to threaded handler.  
> > > > > > >
> > > > > > > Actually, the threaded -> hard transition (patch 9) is where the perf
> > > > > > > gain is.  
> > > > > > hardirq is even more timely for sure. For our use case, the threaded
> > > > > > handler is RT and is also good enough.  
> > > > >
> > > > > Yeah, true. I forgot you were forcing RT priority on threaded handlers.
> > > > > Anyway, let's stick to hardirqs for now, and revisit it if it proves to
> > > > > be too much work done in irq context.  
> > > > Just want to clarify that irq_thread calls sched_set_fifo to make the
> > > > task RT. The behavior is universal and is not specific to any
> > > > downstream kernel.  

There's a difference in what RT means depending on whether the system
is configured with PREEMPT or PREEMPT_RT though. But I assume you're
using PREEMPT not PREEMPT_RT.

> > >
> > > Hm, interesting. In my testing, any of the changes before patch 9
> > > didn't make a huge difference in term of perf, patch 9 is where the perf
> > > gains happen. For the record, patch 6 is where we get rid of the
> > > threaded -> work round-trip for job completion/fence signaling, and it
> > > didn't seem to reflect in the benchmark results, but I'll do another
> > > round of tests before posting v3, just to confirm.  
> > We care the most about signaling latency for this series.

Yes, I know. It's just that it also seemed to help the throughput, which
I initially checked to make sure we were not regressing perfs
significantly by interrupting the system aggressively. I guess the
reason for that is that, by reducing the latency, we also unleash the
job submitter (if you get signaled early, and jobs tend to be
serialized because of deps, you can submit more).

> > I collected
> > some numbers with baseline, with this series, and with patch 9
> > reverted at https://gitlab.freedesktop.org/panfrost/linux/-/work_items/85#note_3481308.
> > Reposting the numbers here for reference
> >
> > |                    | baseline | entire series | patch 9 reverted |
> > | -                  | -        | -             | -                |
> > | frag job median    | 2.8ms    | 2.2ms         | 2.2ms            |
> > | frag job 95%       | 4.5ms    | 2.8ms         | 2.8ms            |
> > | frag job 99%       | 4.9ms    | 2.8ms         | 2.8ms            |
> > | panthor-job median | 0.8us    | 6.2us         | 0.9us            |
> > | panthor-job 95%    | 1.5us    | 16.6us        | 1.5us            |
> > | panthor-job 99%    | 1.6us    | 28.0us        | 1.8us            |  
> 
> panthor-job rows are the durations of the raw irq handlers, collected
> from irq/irq_handler_{entry,exit}.
> 
> frag job rows are the durations from frag jobs, collected from
> gpu_scheduler/drm_sched_job_{run,done}.
> 
> The fence signaling paths of them are
> 
>  - baseline: raw handler -> rt threaded handler -> wq job -> wq job ->
> fence signal
>  - entire series: raw handler -> fence signal
>  - patch 9 reverted: raw handler -> rt threaded handler -> fence signal

Just did another set of throughput tests, and I confirm the gains are
noticeable only with patch 9 applied (that's on rk3588, which embeds a
G610, so not the exact same setup). As an example, on
gfxbench/gl_manhattan, I get the following score bump 2391 -> 2457.

Now I need to set things up to measure latency like you did and make
sure I'm observing the same thing: threaded handlers providing roughly
the same latency as hardirq handlers. If not it probably has to do with
some config options that differ and change the preemptability of the
system.

I'll hold off on the submission of v3 until this is done, because if
threaded handlers are roughly as efficient as hardirq ones, we probably
want to stick to threaded handlers.

  reply	other threads:[~2026-05-20  8:09 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 11:37 [PATCH v2 00/11] drm/panthor: Reduce dma_fence signalling latency Boris Brezillon
2026-05-12 11:37 ` [PATCH v2 01/11] drm/panthor: Make panthor_irq::state a non-atomic field Boris Brezillon
2026-05-12 18:40   ` Chia-I Wu
2026-05-12 11:37 ` [PATCH v2 02/11] drm/panthor: Move the register accessors before the IRQ helpers Boris Brezillon
2026-05-12 18:41   ` Chia-I Wu
2026-05-12 11:37 ` [PATCH v2 03/11] drm/panthor: Replace the panthor_irq macro machinery by inline helpers Boris Brezillon
2026-05-12 18:58   ` Chia-I Wu
2026-05-13  8:03     ` Boris Brezillon
2026-05-13 16:46       ` Chia-I Wu
2026-05-12 11:37 ` [PATCH v2 04/11] drm/panthor: Extend the IRQ logic to allow fast/hard IRQ handlers Boris Brezillon
2026-05-12 19:11   ` Chia-I Wu
2026-05-13  8:09     ` Boris Brezillon
2026-05-13 17:06       ` Chia-I Wu
2026-05-13 17:30         ` Boris Brezillon
2026-05-13 18:17           ` Chia-I Wu
2026-05-18 11:54             ` Boris Brezillon
2026-05-12 11:37 ` [PATCH v2 05/11] drm/panthor: Make panthor_fw_{update,toggle}_reqs() callable from IRQ context Boris Brezillon
2026-05-12 19:29   ` Chia-I Wu
2026-05-12 19:29     ` [PATCH v2 05/11] drm/panthor: Make panthor_fw_{update, toggle}_reqs() " Chia-I Wu
2026-05-12 11:37 ` [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in " Boris Brezillon
2026-05-12 21:04   ` Chia-I Wu
2026-05-13  8:29     ` Boris Brezillon
2026-05-13 17:47       ` Chia-I Wu
2026-05-18 13:45         ` Boris Brezillon
2026-05-18 23:33           ` Chia-I Wu
2026-05-19  7:53             ` Boris Brezillon
2026-05-19 17:16               ` Chia-I Wu
2026-05-19 18:26                 ` Boris Brezillon
2026-05-19 20:45                   ` Chia-I Wu
2026-05-19 21:04                     ` Chia-I Wu
2026-05-20  8:09                       ` Boris Brezillon [this message]
2026-05-20 22:15                         ` Chia-I Wu
2026-05-12 11:37 ` [PATCH v2 07/11] drm/panthor: Automate CSG IRQ processing at group unbind time Boris Brezillon
2026-05-12 21:16   ` Chia-I Wu
2026-05-14 14:17   ` Steven Price
2026-05-12 11:37 ` [PATCH v2 08/11] drm/panthor: Automatically enable interrupts in panthor_fw_wait_acks() Boris Brezillon
2026-05-12 21:55   ` Chia-I Wu
2026-05-13  8:42     ` Boris Brezillon
2026-05-13 17:14       ` Chia-I Wu
2026-05-14 14:25   ` Steven Price
2026-05-18  8:16     ` Boris Brezillon
2026-05-19 14:19       ` Boris Brezillon
2026-05-12 11:37 ` [PATCH v2 09/11] drm/panthor: Process FW events in IRQ context Boris Brezillon
2026-05-12 22:05   ` Chia-I Wu
2026-05-12 22:09     ` Chia-I Wu
2026-05-13  8:44       ` Boris Brezillon
2026-05-14 15:23   ` Steven Price
2026-05-12 11:37 ` [PATCH v2 10/11] drm/panthor: Use the irqsave variant of spin_lock in panthor_gpu_irq_handler() Boris Brezillon
2026-05-14 15:26   ` Steven Price
2026-05-18  8:04     ` Boris Brezillon
2026-05-12 11:37 ` [PATCH v2 11/11] drm/panthor: Process GPU events in IRQ context Boris Brezillon
2026-05-12 11:50   ` Boris Brezillon
2026-05-12 22:40     ` Chia-I Wu
2026-05-13  8:54       ` Boris Brezillon
2026-05-13 18:07         ` Chia-I Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260520100946.398c5282@fedora \
    --to=boris.brezillon@collabora.com \
    --cc=airlied@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liviu.dudau@arm.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=olvaffe@gmail.com \
    --cc=simona@ffwll.ch \
    --cc=steven.price@arm.com \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.