From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Hellstrom Subject: Re: Nouveau fences? Date: Sun, 28 Nov 2010 21:37:34 +0100 Message-ID: <4CF2BD8E.6050402@shipmail.org> References: <4CF24D65.4030008@shipmail.org> <87k4jxmta8.fsf@riseup.net> <87y68dl97k.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from smtp-outbound-1.vmware.com (smtp-outbound-1.vmware.com [65.115.85.69]) by gabe.freedesktop.org (Postfix) with ESMTP id 5949D9E7AF for ; Sun, 28 Nov 2010 12:37:55 -0800 (PST) In-Reply-To: <87y68dl97k.fsf@gmail.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org To: Francisco Jerez Cc: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org On 11/28/2010 05:11 PM, Francisco Jerez wrote: > Francisco Jerez writes: > > >> Thomas Hellstrom writes: >> >> >>> Ben, >>> >>> I'm looking at a way to make TTM memory management asynchronous with >>> the CPU. The idea is that you should basically be able to DMA data to >>> and from memory regions without waiting for idle, as long as the GPU >>> has a means to provide operation ordering. >>> >>> >> Sounds good. I guess you're mainly dealing with BO eviction >> synchronization? The only problem I see on our side is that calls to our >> move() hook aren't guaranteed to be carried out in order (because of the >> multiple hardware channels). I'm thinking that move() could be extended >> with an optional sync_obj argument, that way move() would be able to >> make sure that evictions are strictly ordered with respect to the fence >> specified. >> The way evictions will work is that they appear to take place "instantly", but are scheduled on a channel, and there will be a data structure that keeps track about what fences need to be signaled before a managed area can be reused. The driver will need to provide a function that, given a list of fences, returns a fence that when it signals, guarantees that all other fences in the list have signaled. Single-channel hardware will just return the fence with the highest sequence. Multi-channel hardware may need to insert command stream barriers if available and create a new sync object to return or resort to simply waiting to determine which fence signals last. I guess Nouveau can do command stream barriers, (waiting for other channels to reach a certain command before progressing?) Needless to say, drivers need not activate async operation if they don't want to, but for single-channel hardware it will hopefully be very simple. >> >>> While doing that I looked a bit at the Nouveau fencing. It appears >>> like waiting for fences is polling only (no irq to signal fences)? Is >>> that correct? >>> >>> >> That's right, nvidia hardware has no nice way to schedule a fence-like >> interrupt we could selectively turn on and off around the sync_obj_wait >> hook. There's a bunch of (more or less) chipset-specific hacks that >> could be used to get an equivalent effect, but polling has seemed good >> enough so far (in the typical case we only take the "lazy" path so CPU >> usage is still OK). >> Indeed, I saw the same with unichromes. lazy for throttling and not lazy for other waits, although I ended up with a hrtimer polling loop in the non-lazy case, since software fallbacks tended to eat a lot of CPU while waiting for buffer idle. /Thomas