From: Maarten Lankhorst <maarten.lankhorst@canonical.com>
To: "Christian König" <deathsimple@vodafone.de>,
"Rob Clark" <robdclark@gmail.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-arch@vger.kernel.org,
"linaro-mm-sig@lists.linaro.org" <linaro-mm-sig@lists.linaro.org>,
Colin Cross <ccross@google.com>,
"dri-devel@lists.freedesktop.org"
<dri-devel@lists.freedesktop.org>,
"linux-media@vger.kernel.org" <linux-media@vger.kernel.org>
Subject: Re: [PATCH 2/6] seqno-fence: Hardware dma-buf implementation of fencing (v4)
Date: Wed, 19 Feb 2014 14:25:59 +0100 [thread overview]
Message-ID: <5304B0E7.4000802@canonical.com> (raw)
In-Reply-To: <530257E3.2060508@vodafone.de>
op 17-02-14 19:41, Christian König schreef:
> Am 17.02.2014 19:24, schrieb Rob Clark:
>> On Mon, Feb 17, 2014 at 12:36 PM, Christian König
>> <deathsimple@vodafone.de> wrote:
>>> Am 17.02.2014 18:27, schrieb Rob Clark:
>>>
>>>> On Mon, Feb 17, 2014 at 11:56 AM, Christian König
>>>> <deathsimple@vodafone.de> wrote:
>>>>> Am 17.02.2014 16:56, schrieb Maarten Lankhorst:
>>>>>
>>>>>> This type of fence can be used with hardware synchronization for simple
>>>>>> hardware that can block execution until the condition
>>>>>> (dma_buf[offset] - value) >= 0 has been met.
>>>>>
>>>>> Can't we make that just "dma_buf[offset] != 0" instead? As far as I know
>>>>> this way it would match the definition M$ uses in their WDDM
>>>>> specification
>>>>> and so make it much more likely that hardware supports it.
>>>> well 'buf[offset] >= value' at least means the same slot can be used
>>>> for multiple operations (with increasing values of 'value').. not sure
>>>> if that is something people care about.
>>>>
>>>>> =value seems to be possible with adreno and radeon. I'm not really sure
>>>>> about others (although I presume it as least supported for nv desktop
>>>>> stuff). For hw that cannot do >=value, we can either have a different fence
>>>>> implementation which uses the !=0 approach. Or change seqno-fence
>>>>> implementation later if needed. But if someone has hw that can do !=0 but
>>>>> not >=value, speak up now ;-)
>>>
>>> Here! Radeon can only do >=value on the DMA and 3D engine, but not with UVD
>>> or VCE. And for the 3D engine it means draining the pipe, which isn't really
>>> a good idea.
>> hmm, ok.. forgot you have a few extra rings compared to me. Is UVD
>> re-ordering from decode-order to display-order for you in hw? If not,
>> I guess you need sw intervention anyways when a frame is done for
>> frame re-ordering, so maybe hw->hw sync doesn't really matter as much
>> as compared to gpu/3d->display. For dma<->3d interactions, seems like
>> you would care more about hw<->hw sync, but I guess you aren't likely
>> to use GPU A to do a resolve blit for GPU B..
>
> No UVD isn't reordering, but since frame reordering is predictable you usually end up with pipelining everything to the hardware. E.g. you send the decode commands in decode order to the UVD block and if you have overlay active one of the frames are going to be the first to display and then you want to wait for it on the display side.
>
>> For 3D ring, I assume you probably want a CP_WAIT_FOR_IDLE before a
>> CP_MEM_WRITE to update fence value in memory (for the one signalling
>> the fence). But why would you need that before a CP_WAIT_REG_MEM (for
>> the one waiting for the fence)? I don't exactly have documentation
>> for adreno version of CP_WAIT_REG_{MEM,EQ,GTE}.. but PFP and ME
>> appear to be same instruction set as r600, so I'm pretty sure they
>> should have similar capabilities.. CP_WAIT_REG_MEM appears to be same
>> but with 32bit gpu addresses vs 64b.
>
> You shouldn't use any of the CP commands for engine synchronization (neither for wait nor for signal). The PFP and ME are just the top of a quite deep pipeline and when you use any of the CP_WAIT functions you block them for something and that's draining the pipeline.
>
> With the semaphore and fence commands the values are just attached as prerequisite to the draw command, e.g. the CP setups the draw environment and issues the command, but the actual execution of it is delayed until the "!= 0" condition hits. And in the meantime the CP already prepares the next draw operation.
>
> But at least for compute queues wait semaphore aren't the perfect solution either. What you need then is a GPU scheduler that uses a kernel task for setting up the command submission for you when all prerequisites are meet.
nouveau has sort of a scheduler in hardware. It can yield when waiting on a semaphore. And each process gets their own context and the timeslices can be adjusted. ;-) But I don't mind changing this patch when an actual user pops up. Nouveau can do a wait for (*sema & mask) != 0 only on nvc0 and newer, where mask can be chosen. But it can do == somevalue and >= somevalue on older relevant optimus hardware, so if we know that it was zero before and we know the sign of the new value that could work too.
Adding ops and a separate mask later on when users pop up is fine with me, the original design here was chosen so I could map the intel status page read-only into the process specific nvidia vm.
~Maarten
next prev parent reply other threads:[~2014-02-19 13:26 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-17 15:55 [PATCH 0/6] dma-buf synchronization patches Maarten Lankhorst
2014-02-17 15:55 ` [PATCH 1/6] fence: dma-buf cross-device synchronization (v17) Maarten Lankhorst
2014-02-17 15:55 ` Maarten Lankhorst
2014-02-17 16:56 ` Rob Clark
2014-02-17 15:56 ` [PATCH 2/6] seqno-fence: Hardware dma-buf implementation of fencing (v4) Maarten Lankhorst
2014-02-17 15:56 ` Maarten Lankhorst
2014-02-17 16:55 ` Rob Clark
2014-02-17 16:55 ` Rob Clark
2014-02-17 16:56 ` Christian König
2014-02-17 16:56 ` Christian König
2014-02-17 17:27 ` Rob Clark
2014-02-17 17:27 ` Rob Clark
2014-02-17 17:36 ` Christian König
2014-02-17 18:24 ` Rob Clark
2014-02-17 18:41 ` Christian König
2014-02-17 18:41 ` Christian König
2014-02-19 13:25 ` Maarten Lankhorst [this message]
2014-03-03 21:01 ` Daniel Vetter
2014-02-17 15:56 ` [PATCH 3/6] dma-buf: use reservation objects Maarten Lankhorst
2014-02-17 15:56 ` Maarten Lankhorst
2014-02-17 16:46 ` Rob Clark
2014-02-19 13:58 ` Thomas Hellstrom
2014-02-19 13:58 ` Thomas Hellstrom
2014-02-17 15:57 ` [PATCH 4/6] android: convert sync to fence api, v4 Maarten Lankhorst
2014-02-17 15:57 ` Maarten Lankhorst
2014-02-19 13:56 ` Thomas Hellstrom
2014-02-24 13:01 ` Maarten Lankhorst
2014-03-03 21:11 ` Daniel Vetter
2014-03-04 7:50 ` Maarten Lankhorst
2014-03-04 7:50 ` Maarten Lankhorst
2014-03-04 8:14 ` Daniel Vetter
2014-03-04 8:20 ` Maarten Lankhorst
2014-03-04 10:00 ` Daniel Vetter
2014-03-04 10:42 ` Maarten Lankhorst
2014-03-04 10:42 ` Maarten Lankhorst
2014-02-17 15:58 ` [PATCH 5/6] reservation: add support for fences to enable cross-device synchronisation Maarten Lankhorst
2014-02-17 16:54 ` Rob Clark
2014-02-17 16:54 ` Rob Clark
2014-02-17 15:58 ` [PATCH 6/6] dma-buf: add poll support, v2 Maarten Lankhorst
2014-02-17 15:58 ` Maarten Lankhorst
2014-02-17 16:38 ` Rob Clark
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5304B0E7.4000802@canonical.com \
--to=maarten.lankhorst@canonical.com \
--cc=ccross@google.com \
--cc=deathsimple@vodafone.de \
--cc=dri-devel@lists.freedesktop.org \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=robdclark@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).