From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?ISO-8859-1?Q?Christian_K=F6nig?= Subject: Re: RFC: Radeon multi ring support branch Date: Wed, 16 Nov 2011 00:19:20 +0100 Message-ID: <4EC2F378.9090801@vodafone.de> References: <4EABF8EC.3020701@vodafone.de> <20111115193212.GB2263@homer.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: Received: from outgoing.email.vodafone.de (outgoing.email.vodafone.de [139.7.28.128]) by gabe.freedesktop.org (Postfix) with ESMTP id F3D2E9E79A for ; Tue, 15 Nov 2011 15:19:23 -0800 (PST) In-Reply-To: <20111115193212.GB2263@homer.localdomain> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org To: Jerome Glisse Cc: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org On 15.11.2011 20:32, Jerome Glisse wrote: > On Sat, Oct 29, 2011 at 03:00:28PM +0200, Christian K=F6nig wrote: >> Hello everybody, >> >> to support multiple compute rings, async DMA engines and UVD we need >> to teach the radeon kernel module how to sync buffers between >> different rings and make some changes to the command submission >> ioctls. >> >> Since we can't release any documentation about async DMA or UVD >> (yet), my current branch concentrates on getting the additional >> compute rings on cayman running. Unfortunately those rings have >> hardware bugs that can't be worked around, so they are actually not >> very useful in a production environment, but they should do quite >> well for this testing purpose. >> >> The branch can be found here: >> http://cgit.freedesktop.org/~deathsimple/linux/log/ >> >> Since some of the patches are quite intrusive, constantly rebaseing >> them could get a bit painful. So I would like to see most of the >> stuff included into drm-next, even if we don't make use of the new >> functionality right now. >> >> Comments welcome, >> Christian. > So i have been looking back at all this and now there is somethings > puzzling me. So if semaphore wait for a non null value at gpu address > provided in the packet than the current implementation for the cs > ioctl doesn't work when there is more than 2 rings to sync. Semaphores are counters, so each signal operation is atomically = incrementing the counter, while each wait operation is (atomically) = checking if the counter is above zero and if it is decrement it. So you = can signal/wait on the same address multiple times. > > As it will use only one semaphore so first ring to finish will > mark the semaphore as done even if there is still other ring not > done. Nope, each wait operation will wait for a separate signal operation (at = least I think so). > > This all make me wonder if some change to cs ioctl would make > all this better. So idea of semaphore is to wait for some other > ring to finish something. So let say we have following scenario: > Application submit following to ring1: csA, csB > Application now submit to ring2: cs1, cs2 > And application want csA to be done for cs1 and csB to be done > for cs2. > > To achieve such usage pattern we would need to return fence seq > or similar from the cs ioctl. So user application would know > ringid+fence_seq for csA& csB and provide this when scheduling > cs1& cs2. Here i am assuming MEM_WRITE/WAIT_REG_MEM packet > are as good as MEM_SEMAPHORE packet. Ie the semaphore packet > doesn't give us much more than MEM_WRITE/WAIT_REG_MEM would. > > To achieve that each ring got it's fence scratch addr where to > write seq number. And then we use WAIT_REG_MEM on this addr > and with the specific seq for the other ring that needs > synchronization. This would simplify the semaphore code as > we wouldn't need somethings new beside helper function and > maybe extending the fence structure. I played around with the same Idea before implementing the whole = semaphore stuff, but the killer argument against it is that not all = rings support the WAIT_REG_MEM command. Also the semaphores are much more efficient than the WAIT_REG_MEM = command, because all semaphore commands from the different rings are = send to a central semaphore block, so that constant polling, and with it = constant memory access can be avoided. Additional to that the = WAIT_REG_MEM command has a minimum latency of Wait_Interval*16 clock = cycles, while semaphore need 4 clock cycles for the command and 4 clock = cycles for the result, so it definitely has a much lower latency. We should also keep in mind that the semaphore block is not only capable = to sync between different rings inside a single GPU, but can also = communicate with another semaphore block in a second GPU. So it is a key = part in a multi GPU environment. > Anyway i put updating ring patch at : > http://people.freedesktop.org/~glisse/mrings/ > It rebased on top of linus tree and it has several space > indentation fixes and also a fix for no semaphore allocated > issue (patch 5) > Thanks, I will try to integrate the changes tomorrow. Christian.