* Nouveau fences?
@ 2010-11-28 12:39 Thomas Hellstrom
2010-11-28 14:12 ` Francisco Jerez
0 siblings, 1 reply; 6+ messages in thread
From: Thomas Hellstrom @ 2010-11-28 12:39 UTC (permalink / raw)
To: Ben Skeggs; +Cc: Ben Skeggs, dri-devel@lists.freedesktop.org
Ben,
I'm looking at a way to make TTM memory management asynchronous with the
CPU. The idea is that you should basically be able to DMA data to and
from memory regions without waiting for idle, as long as the GPU has a
means to provide operation ordering.
While doing that I looked a bit at the Nouveau fencing. It appears like
waiting for fences is polling only (no irq to signal fences)? Is that
correct?
/Thomas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Nouveau fences?
2010-11-28 12:39 Nouveau fences? Thomas Hellstrom
@ 2010-11-28 14:12 ` Francisco Jerez
2010-11-28 16:11 ` Francisco Jerez
0 siblings, 1 reply; 6+ messages in thread
From: Francisco Jerez @ 2010-11-28 14:12 UTC (permalink / raw)
To: Thomas Hellstrom; +Cc: dri-devel@lists.freedesktop.org
[-- Attachment #1.1.1: Type: text/plain, Size: 1621 bytes --]
Thomas Hellstrom <thomas@shipmail.org> writes:
> Ben,
>
> I'm looking at a way to make TTM memory management asynchronous with
> the CPU. The idea is that you should basically be able to DMA data to
> and from memory regions without waiting for idle, as long as the GPU
> has a means to provide operation ordering.
>
Sounds good. I guess you're mainly dealing with BO eviction
synchronization? The only problem I see on our side is that calls to our
move() hook aren't guaranteed to be carried out in order (because of the
multiple hardware channels). I'm thinking that move() could be extended
with an optional sync_obj argument, that way move() would be able to
make sure that evictions are strictly ordered with respect to the fence
specified.
> While doing that I looked a bit at the Nouveau fencing. It appears
> like waiting for fences is polling only (no irq to signal fences)? Is
> that correct?
>
That's right, nvidia hardware has no nice way to schedule a fence-like
interrupt we could selectively turn on and off around the sync_obj_wait
hook. There's a bunch of (more or less) chipset-specific hacks that
could be used to get an equivalent effect, but polling has seemed good
enough so far (in the typical case we only take the "lazy" path so CPU
usage is still OK).
Unconditional PFIFO CACHE interrupts might be an option too, but, I'm a
bit afraid of the PFIFO stalls and useless IRQ storms some applications
could trigger.
> /Thomas
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
[-- Attachment #1.2: Type: application/pgp-signature, Size: 229 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Nouveau fences?
2010-11-28 14:12 ` Francisco Jerez
@ 2010-11-28 16:11 ` Francisco Jerez
2010-11-28 20:37 ` Thomas Hellstrom
0 siblings, 1 reply; 6+ messages in thread
From: Francisco Jerez @ 2010-11-28 16:11 UTC (permalink / raw)
To: Thomas Hellstrom; +Cc: dri-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 1839 bytes --]
Francisco Jerez <currojerez@riseup.net> writes:
> Thomas Hellstrom <thomas@shipmail.org> writes:
>
>> Ben,
>>
>> I'm looking at a way to make TTM memory management asynchronous with
>> the CPU. The idea is that you should basically be able to DMA data to
>> and from memory regions without waiting for idle, as long as the GPU
>> has a means to provide operation ordering.
>>
> Sounds good. I guess you're mainly dealing with BO eviction
> synchronization? The only problem I see on our side is that calls to our
> move() hook aren't guaranteed to be carried out in order (because of the
> multiple hardware channels). I'm thinking that move() could be extended
> with an optional sync_obj argument, that way move() would be able to
> make sure that evictions are strictly ordered with respect to the fence
> specified.
>
>> While doing that I looked a bit at the Nouveau fencing. It appears
>> like waiting for fences is polling only (no irq to signal fences)? Is
>> that correct?
>>
> That's right, nvidia hardware has no nice way to schedule a fence-like
> interrupt we could selectively turn on and off around the sync_obj_wait
> hook. There's a bunch of (more or less) chipset-specific hacks that
> could be used to get an equivalent effect, but polling has seemed good
> enough so far (in the typical case we only take the "lazy" path so CPU
> usage is still OK).
>
> Unconditional PFIFO CACHE interrupts might be an option too, but, I'm a
> bit afraid of the PFIFO stalls and useless IRQ storms some applications
> could trigger.
>
Meh, apparently this one couldn't make it through, some spam filter has
decided I'm a spammer for some reason...
>> /Thomas
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
[-- Attachment #1.2: Type: application/pgp-signature, Size: 229 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Nouveau fences?
2010-11-28 16:11 ` Francisco Jerez
@ 2010-11-28 20:37 ` Thomas Hellstrom
2010-11-28 21:55 ` Francisco Jerez
0 siblings, 1 reply; 6+ messages in thread
From: Thomas Hellstrom @ 2010-11-28 20:37 UTC (permalink / raw)
To: Francisco Jerez; +Cc: dri-devel
On 11/28/2010 05:11 PM, Francisco Jerez wrote:
> Francisco Jerez<currojerez@riseup.net> writes:
>
>
>> Thomas Hellstrom<thomas@shipmail.org> writes:
>>
>>
>>> Ben,
>>>
>>> I'm looking at a way to make TTM memory management asynchronous with
>>> the CPU. The idea is that you should basically be able to DMA data to
>>> and from memory regions without waiting for idle, as long as the GPU
>>> has a means to provide operation ordering.
>>>
>>>
>> Sounds good. I guess you're mainly dealing with BO eviction
>> synchronization? The only problem I see on our side is that calls to our
>> move() hook aren't guaranteed to be carried out in order (because of the
>> multiple hardware channels). I'm thinking that move() could be extended
>> with an optional sync_obj argument, that way move() would be able to
>> make sure that evictions are strictly ordered with respect to the fence
>> specified.
>>
The way evictions will work is that they appear to take place
"instantly", but are scheduled on a channel, and there will be a data
structure that keeps track about what fences need to be signaled before
a managed area can be reused.
The driver will need to provide a function that, given a list of fences,
returns a fence that when it signals, guarantees that all other fences
in the list have signaled.
Single-channel hardware will just return the fence with the highest
sequence. Multi-channel hardware may need to insert command stream
barriers if available and create a new sync object to return or resort
to simply waiting to determine which fence signals last.
I guess Nouveau can do command stream barriers, (waiting for other
channels to reach a certain command before progressing?)
Needless to say, drivers need not activate async operation if they don't
want to, but for single-channel hardware it will hopefully be very simple.
>>
>>> While doing that I looked a bit at the Nouveau fencing. It appears
>>> like waiting for fences is polling only (no irq to signal fences)? Is
>>> that correct?
>>>
>>>
>> That's right, nvidia hardware has no nice way to schedule a fence-like
>> interrupt we could selectively turn on and off around the sync_obj_wait
>> hook. There's a bunch of (more or less) chipset-specific hacks that
>> could be used to get an equivalent effect, but polling has seemed good
>> enough so far (in the typical case we only take the "lazy" path so CPU
>> usage is still OK).
>>
Indeed, I saw the same with unichromes. lazy for throttling and not lazy
for other waits, although I ended up with a hrtimer polling loop in the
non-lazy case, since software fallbacks tended to eat a lot of CPU while
waiting for buffer idle.
/Thomas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Nouveau fences?
2010-11-28 20:37 ` Thomas Hellstrom
@ 2010-11-28 21:55 ` Francisco Jerez
2010-11-29 7:28 ` Thomas Hellstrom
0 siblings, 1 reply; 6+ messages in thread
From: Francisco Jerez @ 2010-11-28 21:55 UTC (permalink / raw)
To: Thomas Hellstrom; +Cc: dri-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 2470 bytes --]
Thomas Hellstrom <thomas@shipmail.org> writes:
> On 11/28/2010 05:11 PM, Francisco Jerez wrote:
>> Francisco Jerez<currojerez@riseup.net> writes:
>>
>>
>>> Thomas Hellstrom<thomas@shipmail.org> writes:
>>>
>>>
>>>> Ben,
>>>>
>>>> I'm looking at a way to make TTM memory management asynchronous with
>>>> the CPU. The idea is that you should basically be able to DMA data to
>>>> and from memory regions without waiting for idle, as long as the GPU
>>>> has a means to provide operation ordering.
>>>>
>>>>
>>> Sounds good. I guess you're mainly dealing with BO eviction
>>> synchronization? The only problem I see on our side is that calls to our
>>> move() hook aren't guaranteed to be carried out in order (because of the
>>> multiple hardware channels). I'm thinking that move() could be extended
>>> with an optional sync_obj argument, that way move() would be able to
>>> make sure that evictions are strictly ordered with respect to the fence
>>> specified.
>>>
> The way evictions will work is that they appear to take place
> "instantly", but are scheduled on a channel, and there will be a data
> structure that keeps track about what fences need to be signaled
> before a managed area can be reused.
>
> The driver will need to provide a function that, given a list of
> fences, returns a fence that when it signals, guarantees that all
> other fences in the list have signaled.
Ah, so, evictions made in response to ttm_bo_mem_force_space() are still
going to be synchronous after the changes you have in mind (because in
that case you need to reuse the freed memory immediately), right?
In other cases (e.g. evictions triggered by BO validation), what exactly
would we gain from this function? I mean, why can't we just push waiting
down to ttm_bo_move_ttm/memcpy?
> Single-channel hardware will just return the fence with the highest
> sequence. Multi-channel hardware may need to insert command stream
> barriers if available and create a new sync object to return or resort
> to simply waiting to determine which fence signals last.
>
> I guess Nouveau can do command stream barriers, (waiting for other
> channels to reach a certain command before progressing?)
>
Yep, that's what nouveau_fence_sync() does.
> Needless to say, drivers need not activate async operation if they
> don't want to, but for single-channel hardware it will hopefully be
> very simple.
>
>
[-- Attachment #1.2: Type: application/pgp-signature, Size: 229 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Nouveau fences?
2010-11-28 21:55 ` Francisco Jerez
@ 2010-11-29 7:28 ` Thomas Hellstrom
0 siblings, 0 replies; 6+ messages in thread
From: Thomas Hellstrom @ 2010-11-29 7:28 UTC (permalink / raw)
To: Francisco Jerez; +Cc: dri-devel
On 11/28/2010 10:55 PM, Francisco Jerez wrote:
> Thomas Hellstrom<thomas@shipmail.org> writes:
>
>
>> On 11/28/2010 05:11 PM, Francisco Jerez wrote:
>>
>>> Francisco Jerez<currojerez@riseup.net> writes:
>>>
>>>
>>>
>>>> Thomas Hellstrom<thomas@shipmail.org> writes:
>>>>
>>>>
>>>>
>>>>> Ben,
>>>>>
>>>>> I'm looking at a way to make TTM memory management asynchronous with
>>>>> the CPU. The idea is that you should basically be able to DMA data to
>>>>> and from memory regions without waiting for idle, as long as the GPU
>>>>> has a means to provide operation ordering.
>>>>>
>>>>>
>>>>>
>>>> Sounds good. I guess you're mainly dealing with BO eviction
>>>> synchronization? The only problem I see on our side is that calls to our
>>>> move() hook aren't guaranteed to be carried out in order (because of the
>>>> multiple hardware channels). I'm thinking that move() could be extended
>>>> with an optional sync_obj argument, that way move() would be able to
>>>> make sure that evictions are strictly ordered with respect to the fence
>>>> specified.
>>>>
>>>>
>> The way evictions will work is that they appear to take place
>> "instantly", but are scheduled on a channel, and there will be a data
>> structure that keeps track about what fences need to be signaled
>> before a managed area can be reused.
>>
>> The driver will need to provide a function that, given a list of
>> fences, returns a fence that when it signals, guarantees that all
>> other fences in the list have signaled.
>>
> Ah, so, evictions made in response to ttm_bo_mem_force_space() are still
> going to be synchronous after the changes you have in mind (because in
> that case you need to reuse the freed memory immediately), right?
>
No and yes. Evictions will be asynchronous, but the new user of the
memory area needs to
take appropriate action to make sure it doesn't overwrite old contents.
If it's a CPU upload, it needs to wait on a fence. Single-channel GPU
with dma uploads needs to do nothing.
Multi-channel GPU needs to insert a barrier before uploading, that waits
on the eviction DMA.
So you're right in that we need to give the new move function
information on what to wait on / insert barriers for. I was initially
thinking of a single fence object (and that's why the order function is
needed).
> In other cases (e.g. evictions triggered by BO validation), what exactly
> would we gain from this function? I mean, why can't we just push waiting
> down to ttm_bo_move_ttm/memcpy?
>
That's essentially what's going to happen, but those functions also need
to know what exactly to wait on.
>
>> Single-channel hardware will just return the fence with the highest
>> sequence. Multi-channel hardware may need to insert command stream
>> barriers if available and create a new sync object to return or resort
>> to simply waiting to determine which fence signals last.
>>
>> I guess Nouveau can do command stream barriers, (waiting for other
>> channels to reach a certain command before progressing?)
>>
>>
> Yep, that's what nouveau_fence_sync() does.
>
OK, thanks.
/Thomas
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-11-30 21:12 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-28 12:39 Nouveau fences? Thomas Hellstrom
2010-11-28 14:12 ` Francisco Jerez
2010-11-28 16:11 ` Francisco Jerez
2010-11-28 20:37 ` Thomas Hellstrom
2010-11-28 21:55 ` Francisco Jerez
2010-11-29 7:28 ` Thomas Hellstrom
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.