RFC: Radeon multi ring support branch

All of lore.kernel.org
 help / color / mirror / Atom feed

* RFC: Radeon multi ring support branch
@ 2011-10-29 13:00 Christian König
  2011-10-31 15:05 ` Jerome Glisse
  2011-11-15 19:32 ` Jerome Glisse
  0 siblings, 2 replies; 16+ messages in thread
From: Christian König @ 2011-10-29 13:00 UTC (permalink / raw)
  To: dri-devel

Hello everybody,

to support multiple compute rings, async DMA engines and UVD we need to 
teach the radeon kernel module how to sync buffers between different 
rings and make some changes to the command submission ioctls.

Since we can't release any documentation about async DMA or UVD (yet), 
my current branch concentrates on getting the additional compute rings 
on cayman running. Unfortunately those rings have hardware bugs that 
can't be worked around, so they are actually not very useful in a 
production environment, but they should do quite well for this testing 
purpose.

The branch can be found here: 
http://cgit.freedesktop.org/~deathsimple/linux/log/

Since some of the patches are quite intrusive, constantly rebaseing them 
could get a bit painful. So I would like to see most of the stuff 
included into drm-next, even if we don't make use of the new 
functionality right now.

Comments welcome,
Christian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RFC: Radeon multi ring support branch
  2011-10-29 13:00 RFC: Radeon multi ring support branch Christian König
@ 2011-10-31 15:05 ` Jerome Glisse
  2011-11-02 10:12   ` Christian König
  2011-11-15 19:32 ` Jerome Glisse
  1 sibling, 1 reply; 16+ messages in thread
From: Jerome Glisse @ 2011-10-31 15:05 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Sat, Oct 29, 2011 at 03:00:28PM +0200, Christian König wrote:
> Hello everybody,
> 
> to support multiple compute rings, async DMA engines and UVD we need
> to teach the radeon kernel module how to sync buffers between
> different rings and make some changes to the command submission
> ioctls.
> 
> Since we can't release any documentation about async DMA or UVD
> (yet), my current branch concentrates on getting the additional
> compute rings on cayman running. Unfortunately those rings have
> hardware bugs that can't be worked around, so they are actually not
> very useful in a production environment, but they should do quite
> well for this testing purpose.
> 
> The branch can be found here:
> http://cgit.freedesktop.org/~deathsimple/linux/log/
> 
> Since some of the patches are quite intrusive, constantly rebaseing
> them could get a bit painful. So I would like to see most of the
> stuff included into drm-next, even if we don't make use of the new
> functionality right now.
> 
> Comments welcome,
> Christian.

So for all patches except the interface change see below
Reviewed-by: Jerome Glisse <jglisse@redhat.com>

For the interface change, as discussed previously, i believe prio
should be a userspace argument, kernel could override it.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RFC: Radeon multi ring support branch
  2011-10-31 15:05 ` Jerome Glisse
@ 2011-11-02 10:12   ` Christian König
  2011-11-02 14:24     ` Jerome Glisse
  0 siblings, 1 reply; 16+ messages in thread
From: Christian König @ 2011-11-02 10:12 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: dri-devel

On 31.10.2011 16:05, Jerome Glisse wrote:
> On Sat, Oct 29, 2011 at 03:00:28PM +0200, Christian König wrote:
>> Hello everybody,
>>
>> to support multiple compute rings, async DMA engines and UVD we need
>> to teach the radeon kernel module how to sync buffers between
>> different rings and make some changes to the command submission
>> ioctls.
>>
>> Since we can't release any documentation about async DMA or UVD
>> (yet), my current branch concentrates on getting the additional
>> compute rings on cayman running. Unfortunately those rings have
>> hardware bugs that can't be worked around, so they are actually not
>> very useful in a production environment, but they should do quite
>> well for this testing purpose.
>>
>> The branch can be found here:
>> http://cgit.freedesktop.org/~deathsimple/linux/log/
>>
>> Since some of the patches are quite intrusive, constantly rebaseing
>> them could get a bit painful. So I would like to see most of the
>> stuff included into drm-next, even if we don't make use of the new
>> functionality right now.
>>
>> Comments welcome,
>> Christian.
> So for all patches except the interface change see below
> Reviewed-by: Jerome Glisse<jglisse@redhat.com>
>
> For the interface change, as discussed previously, i believe prio
> should be a userspace argument, kernel could override it.
>
>
Yeah, I'm still not sure what we should do about the priority.

Say for example we have 2 processes. Process A is sending compute jobs 
both with high and low priority, while process B is sending jobs with 
only high priority. Unfortunately the jobs send by B doesn't utilizes 
the hardware to its limits, so that even jobs on a lower priority rings 
get their share of the compute resources.

The effect is that it reverses the priority A wants for its jobs. The 
high priority jobs of A get executed much slower than the low priority 
jobs of A, because B is spamming the hight priority ring with its under 
utilizing jobs.

In such a situation it would be better to adjust the job scheduling a 
bit so that jobs of process A gets on ring 1 on jobs from B get on ring 
2, but I have now idea how to detect such a situation. Anyway, the 
primary goal of the different compute rings is to separate compute from 
GFX so that even with big compute jobs running the system still stays 
responsible to user input, so I think adding a better scheduling for 
compute jobs can be done much later.

Christian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RFC: Radeon multi ring support branch
  2011-11-02 10:12   ` Christian König
@ 2011-11-02 14:24     ` Jerome Glisse
  0 siblings, 0 replies; 16+ messages in thread
From: Jerome Glisse @ 2011-11-02 14:24 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Wed, Nov 02, 2011 at 11:12:42AM +0100, Christian König wrote:
> On 31.10.2011 16:05, Jerome Glisse wrote:
> >On Sat, Oct 29, 2011 at 03:00:28PM +0200, Christian König wrote:
> >>Hello everybody,
> >>
> >>to support multiple compute rings, async DMA engines and UVD we need
> >>to teach the radeon kernel module how to sync buffers between
> >>different rings and make some changes to the command submission
> >>ioctls.
> >>
> >>Since we can't release any documentation about async DMA or UVD
> >>(yet), my current branch concentrates on getting the additional
> >>compute rings on cayman running. Unfortunately those rings have
> >>hardware bugs that can't be worked around, so they are actually not
> >>very useful in a production environment, but they should do quite
> >>well for this testing purpose.
> >>
> >>The branch can be found here:
> >>http://cgit.freedesktop.org/~deathsimple/linux/log/
> >>
> >>Since some of the patches are quite intrusive, constantly rebaseing
> >>them could get a bit painful. So I would like to see most of the
> >>stuff included into drm-next, even if we don't make use of the new
> >>functionality right now.
> >>
> >>Comments welcome,
> >>Christian.
> >So for all patches except the interface change see below
> >Reviewed-by: Jerome Glisse<jglisse@redhat.com>
> >
> >For the interface change, as discussed previously, i believe prio
> >should be a userspace argument, kernel could override it.
> >
> >
> Yeah, I'm still not sure what we should do about the priority.
> 
> Say for example we have 2 processes. Process A is sending compute
> jobs both with high and low priority, while process B is sending
> jobs with only high priority. Unfortunately the jobs send by B
> doesn't utilizes the hardware to its limits, so that even jobs on a
> lower priority rings get their share of the compute resources.
> 
> The effect is that it reverses the priority A wants for its jobs.
> The high priority jobs of A get executed much slower than the low
> priority jobs of A, because B is spamming the hight priority ring
> with its under utilizing jobs.
> 
> In such a situation it would be better to adjust the job scheduling
> a bit so that jobs of process A gets on ring 1 on jobs from B get on
> ring 2, but I have now idea how to detect such a situation. Anyway,
> the primary goal of the different compute rings is to separate
> compute from GFX so that even with big compute jobs running the
> system still stays responsible to user input, so I think adding a
> better scheduling for compute jobs can be done much later.
> 
> Christian.

I think this was already pointed out, my idea was to have the prio
argument in ioctl and have kernel override it. For instance i was
thinking the drm master could set for each process what is the
top priority this process can get, so which ever process is drm
master would choose. Or we could do some kind of userspace daemon
that would have special right from kernel/drm pov and would be
able to set the max priority each process get on the gpu.

But as a first step just getting the prio as an argument sounds
what we should do as right now we won't be doing anykind of
real GPU scheduling.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RFC: Radeon multi ring support branch
  2011-10-29 13:00 RFC: Radeon multi ring support branch Christian König
  2011-10-31 15:05 ` Jerome Glisse
@ 2011-11-15 19:32 ` Jerome Glisse
  2011-11-15 23:19   ` Christian König
  1 sibling, 1 reply; 16+ messages in thread
From: Jerome Glisse @ 2011-11-15 19:32 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Sat, Oct 29, 2011 at 03:00:28PM +0200, Christian König wrote:
> Hello everybody,
> 
> to support multiple compute rings, async DMA engines and UVD we need
> to teach the radeon kernel module how to sync buffers between
> different rings and make some changes to the command submission
> ioctls.
> 
> Since we can't release any documentation about async DMA or UVD
> (yet), my current branch concentrates on getting the additional
> compute rings on cayman running. Unfortunately those rings have
> hardware bugs that can't be worked around, so they are actually not
> very useful in a production environment, but they should do quite
> well for this testing purpose.
> 
> The branch can be found here:
> http://cgit.freedesktop.org/~deathsimple/linux/log/
> 
> Since some of the patches are quite intrusive, constantly rebaseing
> them could get a bit painful. So I would like to see most of the
> stuff included into drm-next, even if we don't make use of the new
> functionality right now.
> 
> Comments welcome,
> Christian.

So i have been looking back at all this and now there is somethings
puzzling me. So if semaphore wait for a non null value at gpu address
provided in the packet than the current implementation for the cs
ioctl doesn't work when there is more than 2 rings to sync.

http://cgit.freedesktop.org/~deathsimple/linux/commit/?h=multi-ring-testing&id=bae372811c697a889ff0cf9128f52fe914d0fe1b

As it will use only one semaphore so first ring to finish will
mark the semaphore as done even if there is still other ring not
done.

This all make me wonder if some change to cs ioctl would make
all this better. So idea of semaphore is to wait for some other
ring to finish something. So let say we have following scenario:
Application submit following to ring1: csA, csB
Application now submit to ring2: cs1, cs2 
And application want csA to be done for cs1 and csB to be done
for cs2.

To achieve such usage pattern we would need to return fence seq
or similar from the cs ioctl. So user application would know
ringid+fence_seq for csA & csB and provide this when scheduling
cs1 & cs2. Here i am assuming MEM_WRITE/WAIT_REG_MEM packet
are as good as MEM_SEMAPHORE packet. Ie the semaphore packet
doesn't give us much more than MEM_WRITE/WAIT_REG_MEM would.

To achieve that each ring got it's fence scratch addr where to
write seq number. And then we use WAIT_REG_MEM on this addr
and with the specific seq for the other ring that needs
synchronization. This would simplify the semaphore code as
we wouldn't need somethings new beside helper function and
maybe extending the fence structure.

Anyway i put updating ring patch at :
http://people.freedesktop.org/~glisse/mrings/
It rebased on top of linus tree and it has several space
indentation fixes and also a fix for no semaphore allocated
issue (patch 5)

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RFC: Radeon multi ring support branch
  2011-11-15 19:32 ` Jerome Glisse
@ 2011-11-15 23:19   ` Christian König
  2011-11-16  0:24     ` Jerome Glisse
  0 siblings, 1 reply; 16+ messages in thread
From: Christian König @ 2011-11-15 23:19 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: dri-devel

On 15.11.2011 20:32, Jerome Glisse wrote:
> On Sat, Oct 29, 2011 at 03:00:28PM +0200, Christian König wrote:
>> Hello everybody,
>>
>> to support multiple compute rings, async DMA engines and UVD we need
>> to teach the radeon kernel module how to sync buffers between
>> different rings and make some changes to the command submission
>> ioctls.
>>
>> Since we can't release any documentation about async DMA or UVD
>> (yet), my current branch concentrates on getting the additional
>> compute rings on cayman running. Unfortunately those rings have
>> hardware bugs that can't be worked around, so they are actually not
>> very useful in a production environment, but they should do quite
>> well for this testing purpose.
>>
>> The branch can be found here:
>> http://cgit.freedesktop.org/~deathsimple/linux/log/
>>
>> Since some of the patches are quite intrusive, constantly rebaseing
>> them could get a bit painful. So I would like to see most of the
>> stuff included into drm-next, even if we don't make use of the new
>> functionality right now.
>>
>> Comments welcome,
>> Christian.
> So i have been looking back at all this and now there is somethings
> puzzling me. So if semaphore wait for a non null value at gpu address
> provided in the packet than the current implementation for the cs
> ioctl doesn't work when there is more than 2 rings to sync.
Semaphores are counters, so each signal operation is atomically 
incrementing the counter, while each wait operation is (atomically) 
checking if the counter is above zero and if it is decrement it. So you 
can signal/wait on the same address multiple times.

>
> As it will use only one semaphore so first ring to finish will
> mark the semaphore as done even if there is still other ring not
> done.
Nope, each wait operation will wait for a separate signal operation (at 
least I think so).
>
> This all make me wonder if some change to cs ioctl would make
> all this better. So idea of semaphore is to wait for some other
> ring to finish something. So let say we have following scenario:
> Application submit following to ring1: csA, csB
> Application now submit to ring2: cs1, cs2
> And application want csA to be done for cs1 and csB to be done
> for cs2.
>
> To achieve such usage pattern we would need to return fence seq
> or similar from the cs ioctl. So user application would know
> ringid+fence_seq for csA&  csB and provide this when scheduling
> cs1&  cs2. Here i am assuming MEM_WRITE/WAIT_REG_MEM packet
> are as good as MEM_SEMAPHORE packet. Ie the semaphore packet
> doesn't give us much more than MEM_WRITE/WAIT_REG_MEM would.
>
> To achieve that each ring got it's fence scratch addr where to
> write seq number. And then we use WAIT_REG_MEM on this addr
> and with the specific seq for the other ring that needs
> synchronization. This would simplify the semaphore code as
> we wouldn't need somethings new beside helper function and
> maybe extending the fence structure.
I played around with the same Idea before implementing the whole 
semaphore stuff, but the killer argument against it is that not all 
rings support the WAIT_REG_MEM command.

Also the semaphores are much more efficient than the WAIT_REG_MEM 
command, because all semaphore commands from the different rings are 
send to a central semaphore block, so that constant polling, and with it 
constant memory access can be avoided. Additional to that the 
WAIT_REG_MEM command has a minimum latency of Wait_Interval*16 clock 
cycles, while semaphore need 4 clock cycles for the command and 4 clock 
cycles for the result, so it definitely has a much lower latency.

We should also keep in mind that the semaphore block is not only capable 
to sync between different rings inside a single GPU, but can also 
communicate with another semaphore block in a second GPU. So it is a key 
part in a multi GPU environment.

> Anyway i put updating ring patch at :
> http://people.freedesktop.org/~glisse/mrings/
> It rebased on top of linus tree and it has several space
> indentation fixes and also a fix for no semaphore allocated
> issue (patch 5)
>
Thanks, I will try to integrate the changes tomorrow.

Christian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RFC: Radeon multi ring support branch
  2011-11-15 23:19   ` Christian König
@ 2011-11-16  0:24     ` Jerome Glisse
  2011-11-17 16:24       ` Christian König
  0 siblings, 1 reply; 16+ messages in thread
From: Jerome Glisse @ 2011-11-16  0:24 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

2011/11/15 Christian König <deathsimple@vodafone.de>:
> On 15.11.2011 20:32, Jerome Glisse wrote:
>>
>> On Sat, Oct 29, 2011 at 03:00:28PM +0200, Christian König wrote:
>>>
>>> Hello everybody,
>>>
>>> to support multiple compute rings, async DMA engines and UVD we need
>>> to teach the radeon kernel module how to sync buffers between
>>> different rings and make some changes to the command submission
>>> ioctls.
>>>
>>> Since we can't release any documentation about async DMA or UVD
>>> (yet), my current branch concentrates on getting the additional
>>> compute rings on cayman running. Unfortunately those rings have
>>> hardware bugs that can't be worked around, so they are actually not
>>> very useful in a production environment, but they should do quite
>>> well for this testing purpose.
>>>
>>> The branch can be found here:
>>> http://cgit.freedesktop.org/~deathsimple/linux/log/
>>>
>>> Since some of the patches are quite intrusive, constantly rebaseing
>>> them could get a bit painful. So I would like to see most of the
>>> stuff included into drm-next, even if we don't make use of the new
>>> functionality right now.
>>>
>>> Comments welcome,
>>> Christian.
>>
>> So i have been looking back at all this and now there is somethings
>> puzzling me. So if semaphore wait for a non null value at gpu address
>> provided in the packet than the current implementation for the cs
>> ioctl doesn't work when there is more than 2 rings to sync.
>
> Semaphores are counters, so each signal operation is atomically incrementing
> the counter, while each wait operation is (atomically) checking if the
> counter is above zero and if it is decrement it. So you can signal/wait on
> the same address multiple times.

Yup i did understand that right.

>>
>> As it will use only one semaphore so first ring to finish will
>> mark the semaphore as done even if there is still other ring not
>> done.
>
> Nope, each wait operation will wait for a separate signal operation (at
> least I think so).

Well as we don't specify on which value semaphore should wait on, i am
prety sure the first ring to increment the semaphore will unblock all
waiter. So if you have ring1 that want to wait on ring2 and ring3 as
soon as ring2 or ring3 is done ring1 will go one while either ring2 or
ring3 might not be done. I will test that tomorrow but from doc i have
it seems so. Thus it will be broken with more than one ring, that
would mean you have to allocate one semaphore for each ring couple you
want to synchronize. Note that the usual case will likely be sync btw
2 ring.

>>
>> This all make me wonder if some change to cs ioctl would make
>> all this better. So idea of semaphore is to wait for some other
>> ring to finish something. So let say we have following scenario:
>> Application submit following to ring1: csA, csB
>> Application now submit to ring2: cs1, cs2
>> And application want csA to be done for cs1 and csB to be done
>> for cs2.
>>
>> To achieve such usage pattern we would need to return fence seq
>> or similar from the cs ioctl. So user application would know
>> ringid+fence_seq for csA&  csB and provide this when scheduling
>> cs1&  cs2. Here i am assuming MEM_WRITE/WAIT_REG_MEM packet
>> are as good as MEM_SEMAPHORE packet. Ie the semaphore packet
>> doesn't give us much more than MEM_WRITE/WAIT_REG_MEM would.
>>
>> To achieve that each ring got it's fence scratch addr where to
>> write seq number. And then we use WAIT_REG_MEM on this addr
>> and with the specific seq for the other ring that needs
>> synchronization. This would simplify the semaphore code as
>> we wouldn't need somethings new beside helper function and
>> maybe extending the fence structure.
>
> I played around with the same Idea before implementing the whole semaphore
> stuff, but the killer argument against it is that not all rings support the
> WAIT_REG_MEM command.
>
> Also the semaphores are much more efficient than the WAIT_REG_MEM command,
> because all semaphore commands from the different rings are send to a
> central semaphore block, so that constant polling, and with it constant
> memory access can be avoided. Additional to that the WAIT_REG_MEM command
> has a minimum latency of Wait_Interval*16 clock cycles, while semaphore need
> 4 clock cycles for the command and 4 clock cycles for the result, so it
> definitely has a much lower latency.

Yup that was my guess too that semaphore block optimize things

> We should also keep in mind that the semaphore block is not only capable to
> sync between different rings inside a single GPU, but can also communicate
> with another semaphore block in a second GPU. So it is a key part in a multi
> GPU environment.
>
>> Anyway i put updating ring patch at :
>> http://people.freedesktop.org/~glisse/mrings/
>> It rebased on top of linus tree and it has several space
>> indentation fixes and also a fix for no semaphore allocated
>> issue (patch 5)
>>
> Thanks, I will try to integrate the changes tomorrow.
>

After retesting the first patch  drm/radeon: fix debugfs handling is NAK
a complete no go.

Issue is that radeon_debugfs_cleanup is call after rdev is free. This
is why i used a static array. I forgot about that, i should have put a
comment. I guess you built your kernel without debugfs or that you
didn't tested to reload the module.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RFC: Radeon multi ring support branch
  2011-11-16  0:24     ` Jerome Glisse
@ 2011-11-17 16:24       ` Christian König
  2011-11-17 16:58         ` Jerome Glisse
  2011-11-17 20:16         ` Alex Deucher
  0 siblings, 2 replies; 16+ messages in thread
From: Christian König @ 2011-11-17 16:24 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: dri-devel

On 16.11.2011 01:24, Jerome Glisse wrote:
> Well as we don't specify on which value semaphore should wait on, i am 
> prety sure the first ring to increment the semaphore will unblock all 
> waiter. So if you have ring1 that want to wait on ring2 and ring3 as 
> soon as ring2 or ring3 is done ring1 will go one while either ring2 or 
> ring3 might not be done. I will test that tomorrow but from doc i have 
> it seems so. Thus it will be broken with more than one ring, that 
> would mean you have to allocate one semaphore for each ring couple you 
> want to synchronize. Note that the usual case will likely be sync btw 
> 2 ring.

Good point, but I played with it a bit more today and it is just 
behaving as I thought it would be. A single signal command will just 
unblock a single waiter, even if there are multiple waiters currently 
for this semaphore, the only thing you can't tell is which waiter will 
come first.

I should also note that the current algorithm will just emit multiple 
wait operations to a single ring, and spread the signal operations to 
all other rings we are interested in. That isn't very efficient, but 
should indeed work quite fine.

> After retesting the first patch  drm/radeon: fix debugfs handling is NAK
> a complete no go.
>
> Issue is that radeon_debugfs_cleanup is call after rdev is free. This
> is why i used a static array. I forgot about that, i should have put a
> comment. I guess you built your kernel without debugfs or that you
> didn't tested to reload the module.

Mhm, I have tested it, seen the crash, and didn't thought that this is a 
problem. Don't ask me why I can't understand it myself right now.

Anyway, I moved the unregistering of the files into a separate function, 
which is now called from radeon_device_fini instead of 
radeon_debugfs_cleanup. That seems to work fine, at least if I haven't 
missed something else.

I also merged your indention fixes and the fix for the never allocated 
semaphores and pushed the result into my public repository 
(http://cgit.freedesktop.org/~deathsimple/linux), so please take another 
look at it.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RFC: Radeon multi ring support branch
  2011-11-17 16:24       ` Christian König
@ 2011-11-17 16:58         ` Jerome Glisse
  2011-11-18 12:19           ` Christian König
  2011-11-17 20:16         ` Alex Deucher
  1 sibling, 1 reply; 16+ messages in thread
From: Jerome Glisse @ 2011-11-17 16:58 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

2011/11/17 Christian König <deathsimple@vodafone.de>:
> On 16.11.2011 01:24, Jerome Glisse wrote:
>>
>> Well as we don't specify on which value semaphore should wait on, i am
>> prety sure the first ring to increment the semaphore will unblock all
>> waiter. So if you have ring1 that want to wait on ring2 and ring3 as soon as
>> ring2 or ring3 is done ring1 will go one while either ring2 or ring3 might
>> not be done. I will test that tomorrow but from doc i have it seems so. Thus
>> it will be broken with more than one ring, that would mean you have to
>> allocate one semaphore for each ring couple you want to synchronize. Note
>> that the usual case will likely be sync btw 2 ring.
>
> Good point, but I played with it a bit more today and it is just behaving as
> I thought it would be. A single signal command will just unblock a single
> waiter, even if there are multiple waiters currently for this semaphore, the
> only thing you can't tell is which waiter will come first.

I am not talking about multiple waiter but about single waiter on multilple
signaler. Current implementation allocate one and only one semaphore
not matter how much ring you want to synchronize with. Which means
the single waiter will be unblock once only a single ring signal, which is
broken if you want to wait for several rings.

> I should also note that the current algorithm will just emit multiple wait
> operations to a single ring, and spread the signal operations to all other
> rings we are interested in. That isn't very efficient, but should indeed
> work quite fine.

Well the cs patch you posted don't do that. It create one semaphore
and emit a wait with that semaphore and emit signal to all ring. That
was my point. But i agree that creating a semaphore for each ring
couple you want to wait for will work.

>> After retesting the first patch  drm/radeon: fix debugfs handling is NAK
>> a complete no go.
>>
>> Issue is that radeon_debugfs_cleanup is call after rdev is free. This
>> is why i used a static array. I forgot about that, i should have put a
>> comment. I guess you built your kernel without debugfs or that you
>> didn't tested to reload the module.
>
> Mhm, I have tested it, seen the crash, and didn't thought that this is a
> problem. Don't ask me why I can't understand it myself right now.
>
> Anyway, I moved the unregistering of the files into a separate function,
> which is now called from radeon_device_fini instead of
> radeon_debugfs_cleanup. That seems to work fine, at least if I haven't
> missed something else.

Please don't touch the debugfs stuff, it's fine the way it is. No need to
change anything there.

> I also merged your indention fixes and the fix for the never allocated
> semaphores and pushed the result into my public repository
> (http://cgit.freedesktop.org/~deathsimple/linux), so please take another
> look at it.
>

I will take a look

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RFC: Radeon multi ring support branch
  2011-11-17 16:24       ` Christian König
  2011-11-17 16:58         ` Jerome Glisse
@ 2011-11-17 20:16         ` Alex Deucher
  2011-11-18 12:44           ` Alex Deucher
  1 sibling, 1 reply; 16+ messages in thread
From: Alex Deucher @ 2011-11-17 20:16 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

2011/11/17 Christian König <deathsimple@vodafone.de>:
> On 16.11.2011 01:24, Jerome Glisse wrote:
>>
>> Well as we don't specify on which value semaphore should wait on, i am
>> prety sure the first ring to increment the semaphore will unblock all
>> waiter. So if you have ring1 that want to wait on ring2 and ring3 as soon as
>> ring2 or ring3 is done ring1 will go one while either ring2 or ring3 might
>> not be done. I will test that tomorrow but from doc i have it seems so. Thus
>> it will be broken with more than one ring, that would mean you have to
>> allocate one semaphore for each ring couple you want to synchronize. Note
>> that the usual case will likely be sync btw 2 ring.
>
> Good point, but I played with it a bit more today and it is just behaving as
> I thought it would be. A single signal command will just unblock a single
> waiter, even if there are multiple waiters currently for this semaphore, the
> only thing you can't tell is which waiter will come first.
>
> I should also note that the current algorithm will just emit multiple wait
> operations to a single ring, and spread the signal operations to all other
> rings we are interested in. That isn't very efficient, but should indeed
> work quite fine.
>
>> After retesting the first patch  drm/radeon: fix debugfs handling is NAK
>> a complete no go.
>>
>> Issue is that radeon_debugfs_cleanup is call after rdev is free. This
>> is why i used a static array. I forgot about that, i should have put a
>> comment. I guess you built your kernel without debugfs or that you
>> didn't tested to reload the module.
>
> Mhm, I have tested it, seen the crash, and didn't thought that this is a
> problem. Don't ask me why I can't understand it myself right now.
>
> Anyway, I moved the unregistering of the files into a separate function,
> which is now called from radeon_device_fini instead of
> radeon_debugfs_cleanup. That seems to work fine, at least if I haven't
> missed something else.
>
> I also merged your indention fixes and the fix for the never allocated
> semaphores and pushed the result into my public repository
> (http://cgit.freedesktop.org/~deathsimple/linux), so please take another
> look at it.

I've got a few other patches to enable further functionality in the
mring patches.
- per ring fence interrupts
- add some additional ring fields to better handle different ring types

http://people.freedesktop.org/~agd5f/mrings/

Alex

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RFC: Radeon multi ring support branch
  2011-11-17 16:58         ` Jerome Glisse
@ 2011-11-18 12:19           ` Christian König
  2011-11-18 14:21             ` Jerome Glisse
  0 siblings, 1 reply; 16+ messages in thread
From: Christian König @ 2011-11-18 12:19 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: dri-devel

On 17.11.2011 17:58, Jerome Glisse wrote:
> 2011/11/17 Christian König<deathsimple@vodafone.de>:
>> On 16.11.2011 01:24, Jerome Glisse wrote:
>>> Well as we don't specify on which value semaphore should wait on, i am
>>> prety sure the first ring to increment the semaphore will unblock all
>>> waiter. So if you have ring1 that want to wait on ring2 and ring3 as soon as
>>> ring2 or ring3 is done ring1 will go one while either ring2 or ring3 might
>>> not be done. I will test that tomorrow but from doc i have it seems so. Thus
>>> it will be broken with more than one ring, that would mean you have to
>>> allocate one semaphore for each ring couple you want to synchronize. Note
>>> that the usual case will likely be sync btw 2 ring.
>> Good point, but I played with it a bit more today and it is just behaving as
>> I thought it would be. A single signal command will just unblock a single
>> waiter, even if there are multiple waiters currently for this semaphore, the
>> only thing you can't tell is which waiter will come first.
> I am not talking about multiple waiter but about single waiter on multilple
> signaler. Current implementation allocate one and only one semaphore
> not matter how much ring you want to synchronize with. Which means
> the single waiter will be unblock once only a single ring signal, which is
> broken if you want to wait for several rings.
Wait a moment, having a single semaphore doesn't means that there is 
only a single waiter, just take a look at the code again:

         for (i = 0; i < RADEON_NUM_RINGS; ++i) {
                 /* no need to sync to our own or unused rings */
                 if (i == p->ring || !p->sync_to_ring[i])
                         continue;

                 if (!p->ib->fence->semaphore) {
                         r = radeon_semaphore_create(p->rdev, 
&p->ib->fence->semaphore);
                         if (r)
                                 return r;
                 }

                 radeon_semaphore_emit_signal(p->rdev, i, 
p->ib->fence->semaphore);
                 radeon_semaphore_emit_wait(p->rdev, p->ring, 
p->ib->fence->semaphore); <--------
         }

It is allocating a single semaphore, thats correct, but at the same time 
it emits multiple wait commands. So if we want to synchronize with just 
one ring, we only wait once on the semaphore, but if we want to 
synchronize with N rings we also wait N times on the same semaphore. 
Since we don't really care about the order in which the signal 
operations arrive that is pretty much ok, because when execution 
continues after the last wait command it has been made sure that every 
signal operation has been fired. The extended semaphore test in my 
repository is also checking for this condition, and it really seems to 
work fine.

>> I should also note that the current algorithm will just emit multiple wait
>> operations to a single ring, and spread the signal operations to all other
>> rings we are interested in. That isn't very efficient, but should indeed
>> work quite fine.
> Well the cs patch you posted don't do that. It create one semaphore
> and emit a wait with that semaphore and emit signal to all ring. That
> was my point. But i agree that creating a semaphore for each ring
> couple you want to wait for will work.
>
>>> After retesting the first patch  drm/radeon: fix debugfs handling is NAK
>>> a complete no go.
>>>
>>> Issue is that radeon_debugfs_cleanup is call after rdev is free. This
>>> is why i used a static array. I forgot about that, i should have put a
>>> comment. I guess you built your kernel without debugfs or that you
>>> didn't tested to reload the module.
>> Mhm, I have tested it, seen the crash, and didn't thought that this is a
>> problem. Don't ask me why I can't understand it myself right now.
>>
>> Anyway, I moved the unregistering of the files into a separate function,
>> which is now called from radeon_device_fini instead of
>> radeon_debugfs_cleanup. That seems to work fine, at least if I haven't
>> missed something else.
> Please don't touch the debugfs stuff, it's fine the way it is. No need to
> change anything there.
At least for me there is need to change something, because using a 
static variable causes the debugfs files to only appear on the first 
card in the system. And in my configuration the first card is always the 
onboard APU, so I wondered for half a day why a locked up IB doesn't 
contains any data, until i finally recognized that I'm looking at the 
wrong GPU in the system.

>> I also merged your indention fixes and the fix for the never allocated
>> semaphores and pushed the result into my public repository
>> (http://cgit.freedesktop.org/~deathsimple/linux), so please take another
>> look at it.
>>
> I will take a look
>
Thanks, but please be aware that I'm going to also integrate Alex 
patches ontop of it today and also give it another test round, just to 
make sure that I'm not doing anything stupid with the debugfs code. So 
if you haven't done already please wait a bit more.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RFC: Radeon multi ring support branch
  2011-11-17 20:16         ` Alex Deucher
@ 2011-11-18 12:44           ` Alex Deucher
  2011-11-18 16:34             ` Jerome Glisse
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Deucher @ 2011-11-18 12:44 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

2011/11/17 Alex Deucher <alexdeucher@gmail.com>:
> 2011/11/17 Christian König <deathsimple@vodafone.de>:
>> On 16.11.2011 01:24, Jerome Glisse wrote:
>>>
>>> Well as we don't specify on which value semaphore should wait on, i am
>>> prety sure the first ring to increment the semaphore will unblock all
>>> waiter. So if you have ring1 that want to wait on ring2 and ring3 as soon as
>>> ring2 or ring3 is done ring1 will go one while either ring2 or ring3 might
>>> not be done. I will test that tomorrow but from doc i have it seems so. Thus
>>> it will be broken with more than one ring, that would mean you have to
>>> allocate one semaphore for each ring couple you want to synchronize. Note
>>> that the usual case will likely be sync btw 2 ring.
>>
>> Good point, but I played with it a bit more today and it is just behaving as
>> I thought it would be. A single signal command will just unblock a single
>> waiter, even if there are multiple waiters currently for this semaphore, the
>> only thing you can't tell is which waiter will come first.
>>
>> I should also note that the current algorithm will just emit multiple wait
>> operations to a single ring, and spread the signal operations to all other
>> rings we are interested in. That isn't very efficient, but should indeed
>> work quite fine.
>>
>>> After retesting the first patch  drm/radeon: fix debugfs handling is NAK
>>> a complete no go.
>>>
>>> Issue is that radeon_debugfs_cleanup is call after rdev is free. This
>>> is why i used a static array. I forgot about that, i should have put a
>>> comment. I guess you built your kernel without debugfs or that you
>>> didn't tested to reload the module.
>>
>> Mhm, I have tested it, seen the crash, and didn't thought that this is a
>> problem. Don't ask me why I can't understand it myself right now.
>>
>> Anyway, I moved the unregistering of the files into a separate function,
>> which is now called from radeon_device_fini instead of
>> radeon_debugfs_cleanup. That seems to work fine, at least if I haven't
>> missed something else.
>>
>> I also merged your indention fixes and the fix for the never allocated
>> semaphores and pushed the result into my public repository
>> (http://cgit.freedesktop.org/~deathsimple/linux), so please take another
>> look at it.
>
> I've got a few other patches to enable further functionality in the
> mring patches.
> - per ring fence interrupts
> - add some additional ring fields to better handle different ring types
>
> http://people.freedesktop.org/~agd5f/mrings/
>

FYI, I updated these later last night.

Alex

> Alex
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RFC: Radeon multi ring support branch
  2011-11-18 12:19           ` Christian König
@ 2011-11-18 14:21             ` Jerome Glisse
  2011-11-18 15:41               ` Jerome Glisse
  0 siblings, 1 reply; 16+ messages in thread
From: Jerome Glisse @ 2011-11-18 14:21 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

2011/11/18 Christian König <deathsimple@vodafone.de>:
> On 17.11.2011 17:58, Jerome Glisse wrote:
>>
>> 2011/11/17 Christian König<deathsimple@vodafone.de>:
>>>
>>> On 16.11.2011 01:24, Jerome Glisse wrote:
>>>>
>>>> Well as we don't specify on which value semaphore should wait on, i am
>>>> prety sure the first ring to increment the semaphore will unblock all
>>>> waiter. So if you have ring1 that want to wait on ring2 and ring3 as
>>>> soon as
>>>> ring2 or ring3 is done ring1 will go one while either ring2 or ring3
>>>> might
>>>> not be done. I will test that tomorrow but from doc i have it seems so.
>>>> Thus
>>>> it will be broken with more than one ring, that would mean you have to
>>>> allocate one semaphore for each ring couple you want to synchronize.
>>>> Note
>>>> that the usual case will likely be sync btw 2 ring.
>>>
>>> Good point, but I played with it a bit more today and it is just behaving
>>> as
>>> I thought it would be. A single signal command will just unblock a single
>>> waiter, even if there are multiple waiters currently for this semaphore,
>>> the
>>> only thing you can't tell is which waiter will come first.
>>
>> I am not talking about multiple waiter but about single waiter on
>> multilple
>> signaler. Current implementation allocate one and only one semaphore
>> not matter how much ring you want to synchronize with. Which means
>> the single waiter will be unblock once only a single ring signal, which is
>> broken if you want to wait for several rings.
>
> Wait a moment, having a single semaphore doesn't means that there is only a
> single waiter, just take a look at the code again:
>
>        for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>                /* no need to sync to our own or unused rings */
>                if (i == p->ring || !p->sync_to_ring[i])
>                        continue;
>
>                if (!p->ib->fence->semaphore) {
>                        r = radeon_semaphore_create(p->rdev,
> &p->ib->fence->semaphore);
>                        if (r)
>                                return r;
>                }
>
>                radeon_semaphore_emit_signal(p->rdev, i,
> p->ib->fence->semaphore);
>                radeon_semaphore_emit_wait(p->rdev, p->ring,
> p->ib->fence->semaphore); <--------
>        }
>
> It is allocating a single semaphore, thats correct, but at the same time it
> emits multiple wait commands. So if we want to synchronize with just one
> ring, we only wait once on the semaphore, but if we want to synchronize with
> N rings we also wait N times on the same semaphore. Since we don't really
> care about the order in which the signal operations arrive that is pretty
> much ok, because when execution continues after the last wait command it has
> been made sure that every signal operation has been fired. The extended
> semaphore test in my repository is also checking for this condition, and it
> really seems to work fine.

So how does the wait logic works, that's what i am asking.
If semaphore block wait for the addr to be non zero than it doesn't work
If semaphore block wait block until it get a signal than it should work thought
there is a possible race issue in hw if the signal ring wakeup the semaphore
block at the same time do the semaphore block consider this as a single
signal or as 2 signal.

radeon_semaphore_create(&p->ib->fence->semaphore)
radeon_semaphore_emit_signal(ringA, &p->ib->fence->semaphore)
radeon_semaphore_emit_signal(ringB, &p->ib->fence->semaphore)
radeon_semaphore_emit_wait(ringC, &p->ib->fence->semaphore)
radeon_semaphore_emit_wait(ringC, &p->ib->fence->semaphore)

ringC will stall until either ringA or ringB signal if it only wait on non zero

Code you put in you repo yesterday tested following case:
radeon_semaphore_emit_wait(ringA, &p->ib->fence->semaphore)
radeon_semaphore_emit_wait(ringB, &p->ib->fence->semaphore)
radeon_semaphore_emit_signal(ringC, &p->ib->fence->semaphore)

Which is fine, i don't argue on such use case. I argue on :
radeon_semaphore_emit_signal(ringA, &p->ib->fence->semaphore)
radeon_semaphore_emit_signal(ringB, &p->ib->fence->semaphore)
radeon_semaphore_emit_wait(ringC, &p->ib->fence->semaphore)

What i think would be safer is :
radeon_semaphore_emit_signal(ringA, semaphore_ringa_ringc)
radeon_semaphore_emit_signal(ringB, semaphore_ringb_ringc)
radeon_semaphore_emit_wait(ringC,semaphore_ringa_ringc)
radeon_semaphore_emit_wait(ringC,semaphore_ringb_ringc)

Which wouldn't be affected by any possible race.

>>> I should also note that the current algorithm will just emit multiple
>>> wait
>>> operations to a single ring, and spread the signal operations to all
>>> other
>>> rings we are interested in. That isn't very efficient, but should indeed
>>> work quite fine.
>>
>> Well the cs patch you posted don't do that. It create one semaphore
>> and emit a wait with that semaphore and emit signal to all ring. That
>> was my point. But i agree that creating a semaphore for each ring
>> couple you want to wait for will work.
>>
>>>> After retesting the first patch  drm/radeon: fix debugfs handling is NAK
>>>> a complete no go.
>>>>
>>>> Issue is that radeon_debugfs_cleanup is call after rdev is free. This
>>>> is why i used a static array. I forgot about that, i should have put a
>>>> comment. I guess you built your kernel without debugfs or that you
>>>> didn't tested to reload the module.
>>>
>>> Mhm, I have tested it, seen the crash, and didn't thought that this is a
>>> problem. Don't ask me why I can't understand it myself right now.
>>>
>>> Anyway, I moved the unregistering of the files into a separate function,
>>> which is now called from radeon_device_fini instead of
>>> radeon_debugfs_cleanup. That seems to work fine, at least if I haven't
>>> missed something else.
>>
>> Please don't touch the debugfs stuff, it's fine the way it is. No need to
>> change anything there.
>
> At least for me there is need to change something, because using a static
> variable causes the debugfs files to only appear on the first card in the
> system. And in my configuration the first card is always the onboard APU, so
> I wondered for half a day why a locked up IB doesn't contains any data,
> until i finally recognized that I'm looking at the wrong GPU in the system.
>
>>> I also merged your indention fixes and the fix for the never allocated
>>> semaphores and pushed the result into my public repository
>>> (http://cgit.freedesktop.org/~deathsimple/linux), so please take another
>>> look at it.
>>>
>> I will take a look
>>
> Thanks, but please be aware that I'm going to also integrate Alex patches
> ontop of it today and also give it another test round, just to make sure
> that I'm not doing anything stupid with the debugfs code. So if you haven't
> done already please wait a bit more.
>
> Thanks,
> Christian.
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RFC: Radeon multi ring support branch
  2011-11-18 14:21             ` Jerome Glisse
@ 2011-11-18 15:41               ` Jerome Glisse
  0 siblings, 0 replies; 16+ messages in thread
From: Jerome Glisse @ 2011-11-18 15:41 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Fri, Nov 18, 2011 at 09:21:50AM -0500, Jerome Glisse wrote:
> 2011/11/18 Christian König <deathsimple@vodafone.de>:
> > On 17.11.2011 17:58, Jerome Glisse wrote:
> >>
> >> 2011/11/17 Christian König<deathsimple@vodafone.de>:
> >>>
> >>> On 16.11.2011 01:24, Jerome Glisse wrote:
> >>>>
> >>>> Well as we don't specify on which value semaphore should wait on, i am
> >>>> prety sure the first ring to increment the semaphore will unblock all
> >>>> waiter. So if you have ring1 that want to wait on ring2 and ring3 as
> >>>> soon as
> >>>> ring2 or ring3 is done ring1 will go one while either ring2 or ring3
> >>>> might
> >>>> not be done. I will test that tomorrow but from doc i have it seems so.
> >>>> Thus
> >>>> it will be broken with more than one ring, that would mean you have to
> >>>> allocate one semaphore for each ring couple you want to synchronize.
> >>>> Note
> >>>> that the usual case will likely be sync btw 2 ring.
> >>>
> >>> Good point, but I played with it a bit more today and it is just behaving
> >>> as
> >>> I thought it would be. A single signal command will just unblock a single
> >>> waiter, even if there are multiple waiters currently for this semaphore,
> >>> the
> >>> only thing you can't tell is which waiter will come first.
> >>
> >> I am not talking about multiple waiter but about single waiter on
> >> multilple
> >> signaler. Current implementation allocate one and only one semaphore
> >> not matter how much ring you want to synchronize with. Which means
> >> the single waiter will be unblock once only a single ring signal, which is
> >> broken if you want to wait for several rings.
> >
> > Wait a moment, having a single semaphore doesn't means that there is only a
> > single waiter, just take a look at the code again:
> >
> >        for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> >                /* no need to sync to our own or unused rings */
> >                if (i == p->ring || !p->sync_to_ring[i])
> >                        continue;
> >
> >                if (!p->ib->fence->semaphore) {
> >                        r = radeon_semaphore_create(p->rdev,
> > &p->ib->fence->semaphore);
> >                        if (r)
> >                                return r;
> >                }
> >
> >                radeon_semaphore_emit_signal(p->rdev, i,
> > p->ib->fence->semaphore);
> >                radeon_semaphore_emit_wait(p->rdev, p->ring,
> > p->ib->fence->semaphore); <--------
> >        }
> >
> > It is allocating a single semaphore, thats correct, but at the same time it
> > emits multiple wait commands. So if we want to synchronize with just one
> > ring, we only wait once on the semaphore, but if we want to synchronize with
> > N rings we also wait N times on the same semaphore. Since we don't really
> > care about the order in which the signal operations arrive that is pretty
> > much ok, because when execution continues after the last wait command it has
> > been made sure that every signal operation has been fired. The extended
> > semaphore test in my repository is also checking for this condition, and it
> > really seems to work fine.
> 
> So how does the wait logic works, that's what i am asking.
> If semaphore block wait for the addr to be non zero than it doesn't work
> If semaphore block wait block until it get a signal than it should work thought
> there is a possible race issue in hw if the signal ring wakeup the semaphore
> block at the same time do the semaphore block consider this as a single
> signal or as 2 signal.
> 
> radeon_semaphore_create(&p->ib->fence->semaphore)
> radeon_semaphore_emit_signal(ringA, &p->ib->fence->semaphore)
> radeon_semaphore_emit_signal(ringB, &p->ib->fence->semaphore)
> radeon_semaphore_emit_wait(ringC, &p->ib->fence->semaphore)
> radeon_semaphore_emit_wait(ringC, &p->ib->fence->semaphore)
> 
> ringC will stall until either ringA or ringB signal if it only wait on non zero
> 
> Code you put in you repo yesterday tested following case:
> radeon_semaphore_emit_wait(ringA, &p->ib->fence->semaphore)
> radeon_semaphore_emit_wait(ringB, &p->ib->fence->semaphore)
> radeon_semaphore_emit_signal(ringC, &p->ib->fence->semaphore)
> 
> Which is fine, i don't argue on such use case. I argue on :
> radeon_semaphore_emit_signal(ringA, &p->ib->fence->semaphore)
> radeon_semaphore_emit_signal(ringB, &p->ib->fence->semaphore)
> radeon_semaphore_emit_wait(ringC, &p->ib->fence->semaphore)
> 
> What i think would be safer is :
> radeon_semaphore_emit_signal(ringA, semaphore_ringa_ringc)
> radeon_semaphore_emit_signal(ringB, semaphore_ringb_ringc)
> radeon_semaphore_emit_wait(ringC,semaphore_ringa_ringc)
> radeon_semaphore_emit_wait(ringC,semaphore_ringb_ringc)
> 
> Which wouldn't be affected by any possible race.
> 
> >>> I should also note that the current algorithm will just emit multiple
> >>> wait
> >>> operations to a single ring, and spread the signal operations to all
> >>> other
> >>> rings we are interested in. That isn't very efficient, but should indeed
> >>> work quite fine.

Ok so i have played with that, and semaphore block never write anything
to memory nor does it read it (at least from what i see). It seems that
the semaphore block queue wait based on addr and unlock wait on signal
one at a time again on addr. Well it does read the value but doesn't
write anything to it afaict. If value is non zero it assume semaphore
have been signaled. My guess is that it's to allow cpu to force a
semaphore to pass.

This lead me to believe that we might be better of using 1 page for all
semaphore and reuse them over and over. Worse case would be when we have
PAGE_SIZE/8 (512 for 4k page) semaphore in flight and we need a new one.
Then we would have to wait for previous semaphore to signal. I think it's
fine given that if we hit that case that means that gpu is super duper
busy with something and that scheduling any new thing can wait (i assume
there won't be gpu with 512 rings or more ;)).

Anyway such simplification can be done as an after patch. I will go over
the serie again as there is a free issue somewhere.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RFC: Radeon multi ring support branch
  2011-11-18 12:44           ` Alex Deucher
@ 2011-11-18 16:34             ` Jerome Glisse
  2011-11-20 21:04               ` Jerome Glisse
  0 siblings, 1 reply; 16+ messages in thread
From: Jerome Glisse @ 2011-11-18 16:34 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Christian König, dri-devel

On Fri, Nov 18, 2011 at 07:44:19AM -0500, Alex Deucher wrote:
> 2011/11/17 Alex Deucher <alexdeucher@gmail.com>:
> > 2011/11/17 Christian König <deathsimple@vodafone.de>:
> >> On 16.11.2011 01:24, Jerome Glisse wrote:
> >>>
> >>> Well as we don't specify on which value semaphore should wait on, i am
> >>> prety sure the first ring to increment the semaphore will unblock all
> >>> waiter. So if you have ring1 that want to wait on ring2 and ring3 as soon as
> >>> ring2 or ring3 is done ring1 will go one while either ring2 or ring3 might
> >>> not be done. I will test that tomorrow but from doc i have it seems so. Thus
> >>> it will be broken with more than one ring, that would mean you have to
> >>> allocate one semaphore for each ring couple you want to synchronize. Note
> >>> that the usual case will likely be sync btw 2 ring.
> >>
> >> Good point, but I played with it a bit more today and it is just behaving as
> >> I thought it would be. A single signal command will just unblock a single
> >> waiter, even if there are multiple waiters currently for this semaphore, the
> >> only thing you can't tell is which waiter will come first.
> >>
> >> I should also note that the current algorithm will just emit multiple wait
> >> operations to a single ring, and spread the signal operations to all other
> >> rings we are interested in. That isn't very efficient, but should indeed
> >> work quite fine.
> >>
> >>> After retesting the first patch  drm/radeon: fix debugfs handling is NAK
> >>> a complete no go.
> >>>
> >>> Issue is that radeon_debugfs_cleanup is call after rdev is free. This
> >>> is why i used a static array. I forgot about that, i should have put a
> >>> comment. I guess you built your kernel without debugfs or that you
> >>> didn't tested to reload the module.
> >>
> >> Mhm, I have tested it, seen the crash, and didn't thought that this is a
> >> problem. Don't ask me why I can't understand it myself right now.
> >>
> >> Anyway, I moved the unregistering of the files into a separate function,
> >> which is now called from radeon_device_fini instead of
> >> radeon_debugfs_cleanup. That seems to work fine, at least if I haven't
> >> missed something else.
> >>
> >> I also merged your indention fixes and the fix for the never allocated
> >> semaphores and pushed the result into my public repository
> >> (http://cgit.freedesktop.org/~deathsimple/linux), so please take another
> >> look at it.
> >
> > I've got a few other patches to enable further functionality in the
> > mring patches.
> > - per ring fence interrupts
> > - add some additional ring fields to better handle different ring types
> >
> > http://people.freedesktop.org/~agd5f/mrings/
> >
> 
> FYI, I updated these later last night.
> 
> Alex
> 

Ok so reviewed the patch serie, please Christian keep v2, v3, ...
informations, i find this usefull. I put updated patch at
http://people.freedesktop.org/~glisse/mrings/

Couple of fixes there, indentation, and also i changed the testing
parameter to be a bit flag which make our life easier when we want
to only test the semaphore stuff and not the bo move.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RFC: Radeon multi ring support branch
  2011-11-18 16:34             ` Jerome Glisse
@ 2011-11-20 21:04               ` Jerome Glisse
  0 siblings, 0 replies; 16+ messages in thread
From: Jerome Glisse @ 2011-11-20 21:04 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Christian König, dri-devel

On Fri, Nov 18, 2011 at 11:34 AM, Jerome Glisse <j.glisse@gmail.com> wrote:
> On Fri, Nov 18, 2011 at 07:44:19AM -0500, Alex Deucher wrote:
>> 2011/11/17 Alex Deucher <alexdeucher@gmail.com>:
>> > 2011/11/17 Christian König <deathsimple@vodafone.de>:
>> >> On 16.11.2011 01:24, Jerome Glisse wrote:
>> >>>
>> >>> Well as we don't specify on which value semaphore should wait on, i am
>> >>> prety sure the first ring to increment the semaphore will unblock all
>> >>> waiter. So if you have ring1 that want to wait on ring2 and ring3 as soon as
>> >>> ring2 or ring3 is done ring1 will go one while either ring2 or ring3 might
>> >>> not be done. I will test that tomorrow but from doc i have it seems so. Thus
>> >>> it will be broken with more than one ring, that would mean you have to
>> >>> allocate one semaphore for each ring couple you want to synchronize. Note
>> >>> that the usual case will likely be sync btw 2 ring.
>> >>
>> >> Good point, but I played with it a bit more today and it is just behaving as
>> >> I thought it would be. A single signal command will just unblock a single
>> >> waiter, even if there are multiple waiters currently for this semaphore, the
>> >> only thing you can't tell is which waiter will come first.
>> >>
>> >> I should also note that the current algorithm will just emit multiple wait
>> >> operations to a single ring, and spread the signal operations to all other
>> >> rings we are interested in. That isn't very efficient, but should indeed
>> >> work quite fine.
>> >>
>> >>> After retesting the first patch  drm/radeon: fix debugfs handling is NAK
>> >>> a complete no go.
>> >>>
>> >>> Issue is that radeon_debugfs_cleanup is call after rdev is free. This
>> >>> is why i used a static array. I forgot about that, i should have put a
>> >>> comment. I guess you built your kernel without debugfs or that you
>> >>> didn't tested to reload the module.
>> >>
>> >> Mhm, I have tested it, seen the crash, and didn't thought that this is a
>> >> problem. Don't ask me why I can't understand it myself right now.
>> >>
>> >> Anyway, I moved the unregistering of the files into a separate function,
>> >> which is now called from radeon_device_fini instead of
>> >> radeon_debugfs_cleanup. That seems to work fine, at least if I haven't
>> >> missed something else.
>> >>
>> >> I also merged your indention fixes and the fix for the never allocated
>> >> semaphores and pushed the result into my public repository
>> >> (http://cgit.freedesktop.org/~deathsimple/linux), so please take another
>> >> look at it.
>> >
>> > I've got a few other patches to enable further functionality in the
>> > mring patches.
>> > - per ring fence interrupts
>> > - add some additional ring fields to better handle different ring types
>> >
>> > http://people.freedesktop.org/~agd5f/mrings/
>> >
>>
>> FYI, I updated these later last night.
>>
>> Alex
>>
>
> Ok so reviewed the patch serie, please Christian keep v2, v3, ...
> informations, i find this usefull. I put updated patch at
> http://people.freedesktop.org/~glisse/mrings/
>
> Couple of fixes there, indentation, and also i changed the testing
> parameter to be a bit flag which make our life easier when we want
> to only test the semaphore stuff and not the bo move.
>
> Cheers,
> Jerome
>

Found a major bug introduced by patch 15, rewrote it. Reuploaded
the whole series at
http://people.freedesktop.org/~glisse/mrings/

Issue is that all the fence list should be initialized only once from
asic init callback. Commit message has bigger explanation.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-11-20 21:04 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-29 13:00 RFC: Radeon multi ring support branch Christian König
2011-10-31 15:05 ` Jerome Glisse
2011-11-02 10:12   ` Christian König
2011-11-02 14:24     ` Jerome Glisse
2011-11-15 19:32 ` Jerome Glisse
2011-11-15 23:19   ` Christian König
2011-11-16  0:24     ` Jerome Glisse
2011-11-17 16:24       ` Christian König
2011-11-17 16:58         ` Jerome Glisse
2011-11-18 12:19           ` Christian König
2011-11-18 14:21             ` Jerome Glisse
2011-11-18 15:41               ` Jerome Glisse
2011-11-17 20:16         ` Alex Deucher
2011-11-18 12:44           ` Alex Deucher
2011-11-18 16:34             ` Jerome Glisse
2011-11-20 21:04               ` Jerome Glisse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.