From mboxrd@z Thu Jan  1 00:00:00 1970
From: =?ISO-8859-1?Q?Christian_K=F6nig?= <deathsimple@vodafone.de>
Subject: Re: RFC: Radeon multi ring support branch
Date: Wed, 16 Nov 2011 00:19:20 +0100
Message-ID: <4EC2F378.9090801@vodafone.de>
References: <4EABF8EC.3020701@vodafone.de>
	<20111115193212.GB2263@homer.localdomain>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
Content-Transfer-Encoding: quoted-printable
Return-path: <dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org>
Received: from outgoing.email.vodafone.de (outgoing.email.vodafone.de
	[139.7.28.128])
	by gabe.freedesktop.org (Postfix) with ESMTP id F3D2E9E79A
	for <dri-devel@lists.freedesktop.org>;
	Tue, 15 Nov 2011 15:19:23 -0800 (PST)
In-Reply-To: <20111115193212.GB2263@homer.localdomain>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org
Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org
To: Jerome Glisse <j.glisse@gmail.com>
Cc: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org

On 15.11.2011 20:32, Jerome Glisse wrote:
> On Sat, Oct 29, 2011 at 03:00:28PM +0200, Christian K=F6nig wrote:
>> Hello everybody,
>>
>> to support multiple compute rings, async DMA engines and UVD we need
>> to teach the radeon kernel module how to sync buffers between
>> different rings and make some changes to the command submission
>> ioctls.
>>
>> Since we can't release any documentation about async DMA or UVD
>> (yet), my current branch concentrates on getting the additional
>> compute rings on cayman running. Unfortunately those rings have
>> hardware bugs that can't be worked around, so they are actually not
>> very useful in a production environment, but they should do quite
>> well for this testing purpose.
>>
>> The branch can be found here:
>> http://cgit.freedesktop.org/~deathsimple/linux/log/
>>
>> Since some of the patches are quite intrusive, constantly rebaseing
>> them could get a bit painful. So I would like to see most of the
>> stuff included into drm-next, even if we don't make use of the new
>> functionality right now.
>>
>> Comments welcome,
>> Christian.
> So i have been looking back at all this and now there is somethings
> puzzling me. So if semaphore wait for a non null value at gpu address
> provided in the packet than the current implementation for the cs
> ioctl doesn't work when there is more than 2 rings to sync.
Semaphores are counters, so each signal operation is atomically =

incrementing the counter, while each wait operation is (atomically) =

checking if the counter is above zero and if it is decrement it. So you =

can signal/wait on the same address multiple times.

>
> As it will use only one semaphore so first ring to finish will
> mark the semaphore as done even if there is still other ring not
> done.
Nope, each wait operation will wait for a separate signal operation (at =

least I think so).
>
> This all make me wonder if some change to cs ioctl would make
> all this better. So idea of semaphore is to wait for some other
> ring to finish something. So let say we have following scenario:
> Application submit following to ring1: csA, csB
> Application now submit to ring2: cs1, cs2
> And application want csA to be done for cs1 and csB to be done
> for cs2.
>
> To achieve such usage pattern we would need to return fence seq
> or similar from the cs ioctl. So user application would know
> ringid+fence_seq for csA&  csB and provide this when scheduling
> cs1&  cs2. Here i am assuming MEM_WRITE/WAIT_REG_MEM packet
> are as good as MEM_SEMAPHORE packet. Ie the semaphore packet
> doesn't give us much more than MEM_WRITE/WAIT_REG_MEM would.
>
> To achieve that each ring got it's fence scratch addr where to
> write seq number. And then we use WAIT_REG_MEM on this addr
> and with the specific seq for the other ring that needs
> synchronization. This would simplify the semaphore code as
> we wouldn't need somethings new beside helper function and
> maybe extending the fence structure.
I played around with the same Idea before implementing the whole =

semaphore stuff, but the killer argument against it is that not all =

rings support the WAIT_REG_MEM command.

Also the semaphores are much more efficient than the WAIT_REG_MEM =

command, because all semaphore commands from the different rings are =

send to a central semaphore block, so that constant polling, and with it =

constant memory access can be avoided. Additional to that the =

WAIT_REG_MEM command has a minimum latency of Wait_Interval*16 clock =

cycles, while semaphore need 4 clock cycles for the command and 4 clock =

cycles for the result, so it definitely has a much lower latency.

We should also keep in mind that the semaphore block is not only capable =

to sync between different rings inside a single GPU, but can also =

communicate with another semaphore block in a second GPU. So it is a key =

part in a multi GPU environment.

> Anyway i put updating ring patch at :
> http://people.freedesktop.org/~glisse/mrings/
> It rebased on top of linus tree and it has several space
> indentation fixes and also a fix for no semaphore allocated
> issue (patch 5)
>
Thanks, I will try to integrate the changes tomorrow.

Christian.