From: Boris Brezillon <boris.brezillon@collabora.com>
To: Jason Ekstrand <jason@jlekstrand.net>
Cc: Rob Herring <robh+dt@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>,
Maling list - DRI developers <dri-devel@lists.freedesktop.org>,
Steven Price <steven.price@arm.com>
Subject: Re: [RFC PATCH 0/7] drm/panfrost: Add a new submit ioctl
Date: Fri, 12 Mar 2021 21:06:17 +0100 [thread overview]
Message-ID: <20210312210617.044bf712@collabora.com> (raw)
In-Reply-To: <20210312192513.469462ef@collabora.com>
On Fri, 12 Mar 2021 19:25:13 +0100
Boris Brezillon <boris.brezillon@collabora.com> wrote:
> > So where does this leave us? Well, it depends on your submit model
> > and exactly how you handle pipeline barriers that sync between
> > engines. If you're taking option 3 above and doing two command
> > buffers for each VkCommandBuffer, then you probably want two
> > serialized timelines, one for each engine, and some mechanism to tell
> > the kernel driver "these two command buffers have to run in parallel"
> > so that your ping-pong works. If you're doing 1 or 2 above, I think
> > you probably still want two simple syncobjs, one for each engine. You
> > don't really have any need to go all that far back in history. All
> > you really need to describe is "command buffer X depends on previous
> > compute work" or "command buffer X depends on previous binning work".
>
> Okay, so this will effectively force in-order execution. Let's take your
> previous example and add 2 more jobs at the end that have no deps on
> previous commands:
>
> vkBeginRenderPass() /* Writes to ImageA */
> vkCmdDraw()
> vkCmdDraw()
> ...
> vkEndRenderPass()
> vkPipelineBarrier(imageA /* fragment -> compute */)
> vkCmdDispatch() /* reads imageA, writes BufferB */
> vkBeginRenderPass() /* Writes to ImageC */
> vkCmdBindVertexBuffers(bufferB)
> vkCmdDraw();
> ...
> vkEndRenderPass()
> vkBeginRenderPass() /* Writes to ImageD */
> vkCmdDraw()
> ...
> vkEndRenderPass()
>
> A: Vertex for the first draw on the compute engine
> B: Vertex for the first draw on the compute engine
> C: Fragment for the first draw on the binning engine; depends on A
> D: Fragment for the second draw on the binning engine; depends on B
> E: Compute on the compute engine; depends on C and D
> F: Vertex for the third draw on the compute engine; depends on E
> G: Fragment for the third draw on the binning engine; depends on F
> H: Vertex for the fourth draw on the compute engine
> I: Fragment for the fourth draw on the binning engine
>
> When we reach E, we might be waiting for D to finish before scheduling
> the job, and because of the implicit serialization we have on the
> compute queue (F implicitly depends on E, and H on F) we can't schedule
> H either, which could, in theory be started. I guess that's where the
> term submission order is a bit unclear to me. The action of starting a
> job sounds like execution order to me (the order you starts jobs
> determines the execution order since we only have one HW queue per job
> type). All implicit deps have been calculated when we queued the job to
> the SW queue, and I thought that would be enough to meet the submission
> order requirements, but I might be wrong.
>
> The PoC I have was trying to get rid of this explicit serialization on
> the compute and fragment queues by having one syncobj timeline
> (queue(<syncpoint>)) and synchronization points (Sx).
>
> S0: in-fences=<waitSemaphores[]>, out-fences=<explicit_deps> #waitSemaphore sync point
> A: in-fences=<explicit_deps>, out-fences=<queue(1)>
> B: in-fences=<explicit_deps>, out-fences=<queue(2)>
> C: in-fences=<explicit_deps>, out-fence=<queue(3)> #implicit dep on A through the tiler context
> D: in-fences=<explicit_deps>, out-fence=<queue(4)> #implicit dep on B through the tiler context
> E: in-fences=<explicit_deps>, out-fence=<queue(5)> #implicit dep on D through imageA
> F: in-fences=<explicit_deps>, out-fence=<queue(6)> #implicit dep on E through buffer B
> G: in-fences=<explicit_deps>, out-fence=<queue(7)> #implicit dep on F through the tiler context
> H: in-fences=<explicit_deps>, out-fence=<queue(8)>
> I: in-fences=<explicit_deps>, out-fence=<queue(9)> #implicit dep on H through the tiler buffer
> S1: in-fences=<queue(9)>, out-fences=<signalSemaphores[],fence> #signalSemaphore,fence sync point
> # QueueWaitIdle is implemented with a wait(queue(0)), AKA wait on the last point
>
> With this solution H can be started before E if the compute slot
> is empty and E's implicit deps are not done. It's probably overkill,
> but I thought maximizing GPU utilization was important.
Nevermind, I forgot the drm scheduler was dequeuing jobs in order, so 2
syncobjs (one per queue type) is indeed the right approach.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
next prev parent reply other threads:[~2021-03-12 20:06 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-11 9:25 [RFC PATCH 0/7] drm/panfrost: Add a new submit ioctl Boris Brezillon
2021-03-11 9:25 ` [RFC PATCH 1/7] drm/panfrost: Pass a job to panfrost_{acquire, attach_object_fences}() Boris Brezillon
2021-03-11 9:25 ` [RFC PATCH 2/7] drm/panfrost: Collect implicit and explicit deps in an XArray Boris Brezillon
2021-03-11 9:25 ` [RFC PATCH 3/7] drm/panfrost: Move the mappings collection out of panfrost_lookup_bos() Boris Brezillon
2021-03-11 9:25 ` [RFC PATCH 4/7] drm/panfrost: Add BO access flags to relax dependencies between jobs Boris Brezillon
2021-03-11 9:25 ` [RFC PATCH 5/7] drm/panfrost: Add a new ioctl to submit batches Boris Brezillon
2021-03-11 9:25 ` [RFC PATCH 6/7] drm/panfrost: Advertise the SYNCOBJ_TIMELINE feature Boris Brezillon
2021-03-11 9:25 ` [RFC PATCH 7/7] drm/panfrost: Bump minor version to reflect the feature additions Boris Brezillon
2021-03-11 12:16 ` [RFC PATCH 0/7] drm/panfrost: Add a new submit ioctl Steven Price
2021-03-11 13:00 ` Boris Brezillon
2021-03-11 16:58 ` Jason Ekstrand
2021-03-11 17:24 ` Boris Brezillon
2021-03-11 18:11 ` Jason Ekstrand
2021-03-11 22:38 ` Alyssa Rosenzweig
2021-03-12 7:31 ` Boris Brezillon
2021-03-12 15:37 ` Jason Ekstrand
2021-03-12 18:25 ` Boris Brezillon
2021-03-12 20:06 ` Boris Brezillon [this message]
2021-03-12 21:48 ` Alyssa Rosenzweig
2021-03-12 14:15 ` Boris Brezillon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210312210617.044bf712@collabora.com \
--to=boris.brezillon@collabora.com \
--cc=alyssa.rosenzweig@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=jason@jlekstrand.net \
--cc=robh+dt@kernel.org \
--cc=robin.murphy@arm.com \
--cc=steven.price@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.