From mboxrd@z Thu Jan  1 00:00:00 1970
From: Thomas Hellstrom <thomas@shipmail.org>
Subject: Re: Nouveau fences?
Date: Sun, 28 Nov 2010 21:37:34 +0100
Message-ID: <4CF2BD8E.6050402@shipmail.org>
References: <4CF24D65.4030008@shipmail.org> <87k4jxmta8.fsf@riseup.net>
	<87y68dl97k.fsf@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org>
Received: from smtp-outbound-1.vmware.com (smtp-outbound-1.vmware.com
	[65.115.85.69])
	by gabe.freedesktop.org (Postfix) with ESMTP id 5949D9E7AF
	for <dri-devel@lists.freedesktop.org>;
	Sun, 28 Nov 2010 12:37:55 -0800 (PST)
In-Reply-To: <87y68dl97k.fsf@gmail.com>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org
Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org
To: Francisco Jerez <currojerez@gmail.com>
Cc: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org

On 11/28/2010 05:11 PM, Francisco Jerez wrote:
> Francisco Jerez<currojerez@riseup.net>  writes:
>
>    
>> Thomas Hellstrom<thomas@shipmail.org>  writes:
>>
>>      
>>> Ben,
>>>
>>> I'm looking at a way to make TTM memory management asynchronous with
>>> the CPU. The idea is that you should basically be able to DMA data to
>>> and from memory regions without waiting for idle, as long as the GPU
>>> has a means to provide operation ordering.
>>>
>>>        
>> Sounds good. I guess you're mainly dealing with BO eviction
>> synchronization? The only problem I see on our side is that calls to our
>> move() hook aren't guaranteed to be carried out in order (because of the
>> multiple hardware channels). I'm thinking that move() could be extended
>> with an optional sync_obj argument, that way move() would be able to
>> make sure that evictions are strictly ordered with respect to the fence
>> specified.
>>      
The way evictions will work is that they appear to take place 
"instantly", but are scheduled on a channel, and there will be a data 
structure that keeps track about what fences need to be signaled before 
a managed area can be reused.

The driver will need to provide a function that, given a list of fences, 
returns a fence that when it signals, guarantees that all other fences 
in the list have signaled.
Single-channel hardware will just return the fence with the highest 
sequence. Multi-channel hardware may need to insert command stream 
barriers if available and create a new sync object to return or resort 
to simply waiting to determine which fence signals last.

I guess Nouveau can do command stream barriers, (waiting for other 
channels to reach a certain command before progressing?)

Needless to say, drivers need not activate async operation if they don't 
want to, but for single-channel hardware it will hopefully be very simple.


>>      
>>> While doing that I looked a bit at the Nouveau fencing. It appears
>>> like waiting for fences is polling only (no irq to signal fences)? Is
>>> that correct?
>>>
>>>        
>> That's right, nvidia hardware has no nice way to schedule a fence-like
>> interrupt we could selectively turn on and off around the sync_obj_wait
>> hook. There's a bunch of (more or less) chipset-specific hacks that
>> could be used to get an equivalent effect, but polling has seemed good
>> enough so far (in the typical case we only take the "lazy" path so CPU
>> usage is still OK).
>>      

Indeed, I saw the same with unichromes. lazy for throttling and not lazy 
for other waits, although I ended up with a hrtimer polling loop in the 
non-lazy case, since software fallbacks tended to eat a lot of CPU while 
waiting for buffer idle.

/Thomas