All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/radeon: make 64bit fences more robust
@ 2012-09-10  9:13 Christian König
  2012-09-10 11:12 ` Michel Dänzer
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2012-09-10  9:13 UTC (permalink / raw)
  To: dri-devel, airlied

Only increase the higher 32bits if we really detect a wrap around.

Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=54129
https://bugs.freedesktop.org/show_bug.cgi?id=54662

Possible fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=846505
https://bugzilla.redhat.com/show_bug.cgi?id=845639

Signed-off-by: Christian König <deathsimple@vodafone.de>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/radeon/radeon_fence.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index 7b737b9..4781e13 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -160,7 +160,7 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
 	do {
 		seq = radeon_fence_read(rdev, ring);
 		seq |= last_seq & 0xffffffff00000000LL;
-		if (seq < last_seq) {
+		if (seq < (last_seq - 0x80000000LL)) {
 			seq += 0x100000000LL;
 		}
 
@@ -811,8 +811,8 @@ static void radeon_fence_driver_init_ring(struct radeon_device *rdev, int ring)
 	rdev->fence_drv[ring].cpu_addr = NULL;
 	rdev->fence_drv[ring].gpu_addr = 0;
 	for (i = 0; i < RADEON_NUM_RINGS; ++i)
-		rdev->fence_drv[ring].sync_seq[i] = 0;
-	atomic64_set(&rdev->fence_drv[ring].last_seq, 0);
+		rdev->fence_drv[ring].sync_seq[i] = 0x100000000LL;
+	atomic64_set(&rdev->fence_drv[ring].last_seq, 0x100000000LL);
 	rdev->fence_drv[ring].last_activity = jiffies;
 	rdev->fence_drv[ring].initialized = false;
 }
-- 
1.7.9.5

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: make 64bit fences more robust
  2012-09-10  9:13 [PATCH] drm/radeon: make 64bit fences more robust Christian König
@ 2012-09-10 11:12 ` Michel Dänzer
  2012-09-10 12:02   ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Michel Dänzer @ 2012-09-10 11:12 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Mon, 2012-09-10 at 11:13 +0200, Christian König wrote: 
> Only increase the higher 32bits if we really detect a wrap around.
> 
> Fixes:
> https://bugs.freedesktop.org/show_bug.cgi?id=54129
> https://bugs.freedesktop.org/show_bug.cgi?id=54662
> 
> Possible fixes:
> https://bugzilla.redhat.com/show_bug.cgi?id=846505
> https://bugzilla.redhat.com/show_bug.cgi?id=845639
> 
> Signed-off-by: Christian König <deathsimple@vodafone.de>
> Cc: stable@vger.kernel.org
> ---
>  drivers/gpu/drm/radeon/radeon_fence.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
> index 7b737b9..4781e13 100644
> --- a/drivers/gpu/drm/radeon/radeon_fence.c
> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
> @@ -160,7 +160,7 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
>  	do {
>  		seq = radeon_fence_read(rdev, ring);
>  		seq |= last_seq & 0xffffffff00000000LL;
> -		if (seq < last_seq) {
> +		if (seq < (last_seq - 0x80000000LL)) {
>  			seq += 0x100000000LL;
>  		}

Can you provide a bit more explanation for this change? In particular,
how could the code previously detect a wraparound when there was none,
and why is this the proper fix?


-- 
Earthling Michel Dänzer           |                   http://www.amd.com
Libre software enthusiast         |          Debian, X and DRI developer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: make 64bit fences more robust
  2012-09-10 11:12 ` Michel Dänzer
@ 2012-09-10 12:02   ` Christian König
  2012-09-10 15:32     ` Jerome Glisse
  2012-09-10 15:38     ` Michel Dänzer
  0 siblings, 2 replies; 13+ messages in thread
From: Christian König @ 2012-09-10 12:02 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: dri-devel

On 10.09.2012 13:12, Michel Dänzer wrote:
> On Mon, 2012-09-10 at 11:13 +0200, Christian König wrote:
>> Only increase the higher 32bits if we really detect a wrap around.
>>
>> Fixes:
>> https://bugs.freedesktop.org/show_bug.cgi?id=54129
>> https://bugs.freedesktop.org/show_bug.cgi?id=54662
>>
>> Possible fixes:
>> https://bugzilla.redhat.com/show_bug.cgi?id=846505
>> https://bugzilla.redhat.com/show_bug.cgi?id=845639
>>
>> Signed-off-by: Christian König <deathsimple@vodafone.de>
>> Cc: stable@vger.kernel.org
>> ---
>>   drivers/gpu/drm/radeon/radeon_fence.c |    6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
>> index 7b737b9..4781e13 100644
>> --- a/drivers/gpu/drm/radeon/radeon_fence.c
>> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
>> @@ -160,7 +160,7 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
>>   	do {
>>   		seq = radeon_fence_read(rdev, ring);
>>   		seq |= last_seq & 0xffffffff00000000LL;
>> -		if (seq < last_seq) {
>> +		if (seq < (last_seq - 0x80000000LL)) {
>>   			seq += 0x100000000LL;
>>   		}
> Can you provide a bit more explanation for this change? In particular,
> how could the code previously detect a wraparound when there was none,
> and why is this the proper fix?

Honestly I also don't really understand how this bug happened in the 
first place.

We extend the 32bit fences supported by hardware by testing if a 
previously read fence value is smaller than the value we read now:

>		if (seq < last_seq) {

But the problem seems to be that on some systems we do get fence values 
that are decreasing, e.g. instead of 5, 6, 7, 8 we get 5, 7, 6, 8 (or 
maybe 5, 6, 0, 7, 8 because somebody accidentally overwrites the fence 
value).

It might be related to a hardware bug, or the algorithm is flawed in a 
way I currently don't see. Anyway the old code we had wasn't so picky 
about such problems and the patch just tries to make the current code as 
robust as the old code was, which indeed seems to solve the problems we see.

The wrap around detection still works (tested by setting the initial 
fence value to 0xfffffff0 and letting it wrap around shortly after 
start), so I think it we can safely commit this.

Christian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: make 64bit fences more robust
  2012-09-10 12:02   ` Christian König
@ 2012-09-10 15:32     ` Jerome Glisse
  2012-09-10 15:38     ` Michel Dänzer
  1 sibling, 0 replies; 13+ messages in thread
From: Jerome Glisse @ 2012-09-10 15:32 UTC (permalink / raw)
  To: Christian König; +Cc: Michel Dänzer, dri-devel

On Mon, Sep 10, 2012 at 8:02 AM, Christian König
<deathsimple@vodafone.de> wrote:
> On 10.09.2012 13:12, Michel Dänzer wrote:
>>
>> On Mon, 2012-09-10 at 11:13 +0200, Christian König wrote:
>>>
>>> Only increase the higher 32bits if we really detect a wrap around.
>>>
>>> Fixes:
>>> https://bugs.freedesktop.org/show_bug.cgi?id=54129
>>> https://bugs.freedesktop.org/show_bug.cgi?id=54662
>>>
>>> Possible fixes:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=846505
>>> https://bugzilla.redhat.com/show_bug.cgi?id=845639
>>>
>>> Signed-off-by: Christian König <deathsimple@vodafone.de>
>>> Cc: stable@vger.kernel.org
>>> ---
>>>   drivers/gpu/drm/radeon/radeon_fence.c |    6 +++---
>>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c
>>> b/drivers/gpu/drm/radeon/radeon_fence.c
>>> index 7b737b9..4781e13 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_fence.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
>>> @@ -160,7 +160,7 @@ void radeon_fence_process(struct radeon_device *rdev,
>>> int ring)
>>>         do {
>>>                 seq = radeon_fence_read(rdev, ring);
>>>                 seq |= last_seq & 0xffffffff00000000LL;
>>> -               if (seq < last_seq) {
>>> +               if (seq < (last_seq - 0x80000000LL)) {
>>>                         seq += 0x100000000LL;
>>>                 }
>>
>> Can you provide a bit more explanation for this change? In particular,
>> how could the code previously detect a wraparound when there was none,
>> and why is this the proper fix?
>
>
> Honestly I also don't really understand how this bug happened in the first
> place.
>
> We extend the 32bit fences supported by hardware by testing if a previously
> read fence value is smaller than the value we read now:
>
>>                 if (seq < last_seq) {
>
>
> But the problem seems to be that on some systems we do get fence values that
> are decreasing, e.g. instead of 5, 6, 7, 8 we get 5, 7, 6, 8 (or maybe 5, 6,
> 0, 7, 8 because somebody accidentally overwrites the fence value).
>
> It might be related to a hardware bug, or the algorithm is flawed in a way I
> currently don't see. Anyway the old code we had wasn't so picky about such
> problems and the patch just tries to make the current code as robust as the
> old code was, which indeed seems to solve the problems we see.
>
> The wrap around detection still works (tested by setting the initial fence
> value to 0xfffffff0 and letting it wrap around shortly after start), so I
> think it we can safely commit this.
>
> Christian.
>

If fence read ever gave the value 0 then your patch is postponing the
issue until last fence reach >= 0x1 8000 0001 (which will takes
month/years of uptime to happen :)). Having a log of all emitted fence
value and all read en fence value would probably be helpful so we have
a better clue of what's going on.

Sadly i don't think we don't receive fence in order but rather that we
sometimes receive a 0 value fence, if we did receive fence out of
order than the issue would also have happen with previous code.

Otherwise patch is ack but please add a better comment about fence
value most likely being 0 for unknow reasons or receive out of order
(thought i don't think so).

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: make 64bit fences more robust
  2012-09-10 12:02   ` Christian König
  2012-09-10 15:32     ` Jerome Glisse
@ 2012-09-10 15:38     ` Michel Dänzer
  2012-09-10 15:52       ` Jerome Glisse
  1 sibling, 1 reply; 13+ messages in thread
From: Michel Dänzer @ 2012-09-10 15:38 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Mon, 2012-09-10 at 14:02 +0200, Christian König wrote: 
> On 10.09.2012 13:12, Michel Dänzer wrote:
> > On Mon, 2012-09-10 at 11:13 +0200, Christian König wrote:
> >> Only increase the higher 32bits if we really detect a wrap around.
> >>
> >> Fixes:
> >> https://bugs.freedesktop.org/show_bug.cgi?id=54129
> >> https://bugs.freedesktop.org/show_bug.cgi?id=54662
> >>
> >> Possible fixes:
> >> https://bugzilla.redhat.com/show_bug.cgi?id=846505
> >> https://bugzilla.redhat.com/show_bug.cgi?id=845639
> >>
> >> Signed-off-by: Christian König <deathsimple@vodafone.de>
> >> Cc: stable@vger.kernel.org
> >> ---
> >>   drivers/gpu/drm/radeon/radeon_fence.c |    6 +++---
> >>   1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
> >> index 7b737b9..4781e13 100644
> >> --- a/drivers/gpu/drm/radeon/radeon_fence.c
> >> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
> >> @@ -160,7 +160,7 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
> >>   	do {
> >>   		seq = radeon_fence_read(rdev, ring);
> >>   		seq |= last_seq & 0xffffffff00000000LL;
> >> -		if (seq < last_seq) {
> >> +		if (seq < (last_seq - 0x80000000LL)) {
> >>   			seq += 0x100000000LL;
> >>   		}
> > Can you provide a bit more explanation for this change? In particular,
> > how could the code previously detect a wraparound when there was none,
> > and why is this the proper fix?
> 
> Honestly I also don't really understand how this bug happened in the 
> first place.
> 
> We extend the 32bit fences supported by hardware by testing if a 
> previously read fence value is smaller than the value we read now:
> 
> >		if (seq < last_seq) {
> 
> But the problem seems to be that on some systems we do get fence values 
> that are decreasing, e.g. instead of 5, 6, 7, 8 we get 5, 7, 6, 8 (or 
> maybe 5, 6, 0, 7, 8 because somebody accidentally overwrites the fence 
> value).

Maybe some kind of race involving radeon_fence_write()?


> It might be related to a hardware bug, or the algorithm is flawed in a 
> way I currently don't see. Anyway the old code we had wasn't so picky 
> about such problems and the patch just tries to make the current code as 
> robust as the old code was, which indeed seems to solve the problems we see.
> 
> The wrap around detection still works (tested by setting the initial 
> fence value to 0xfffffff0 and letting it wrap around shortly after 
> start), so I think it we can safely commit this.

Without knowing exactly what kind of hardware fence value pattern caused
the problem, we can't be sure that the wraparound handling will work
reliably, or that the values going backwards won't cause other problems.
I think it would be good to get more real-world data on that.


-- 
Earthling Michel Dänzer           |                   http://www.amd.com
Libre software enthusiast         |          Debian, X and DRI developer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: make 64bit fences more robust
  2012-09-10 15:38     ` Michel Dänzer
@ 2012-09-10 15:52       ` Jerome Glisse
  2012-09-10 16:07         ` Jerome Glisse
  0 siblings, 1 reply; 13+ messages in thread
From: Jerome Glisse @ 2012-09-10 15:52 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: dri-devel

On Mon, Sep 10, 2012 at 11:38 AM, Michel Dänzer <michel@daenzer.net> wrote:
> On Mon, 2012-09-10 at 14:02 +0200, Christian König wrote:
>> On 10.09.2012 13:12, Michel Dänzer wrote:
>> > On Mon, 2012-09-10 at 11:13 +0200, Christian König wrote:
>> >> Only increase the higher 32bits if we really detect a wrap around.
>> >>
>> >> Fixes:
>> >> https://bugs.freedesktop.org/show_bug.cgi?id=54129
>> >> https://bugs.freedesktop.org/show_bug.cgi?id=54662
>> >>
>> >> Possible fixes:
>> >> https://bugzilla.redhat.com/show_bug.cgi?id=846505
>> >> https://bugzilla.redhat.com/show_bug.cgi?id=845639
>> >>
>> >> Signed-off-by: Christian König <deathsimple@vodafone.de>
>> >> Cc: stable@vger.kernel.org
>> >> ---
>> >>   drivers/gpu/drm/radeon/radeon_fence.c |    6 +++---
>> >>   1 file changed, 3 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
>> >> index 7b737b9..4781e13 100644
>> >> --- a/drivers/gpu/drm/radeon/radeon_fence.c
>> >> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
>> >> @@ -160,7 +160,7 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
>> >>    do {
>> >>            seq = radeon_fence_read(rdev, ring);
>> >>            seq |= last_seq & 0xffffffff00000000LL;
>> >> -          if (seq < last_seq) {
>> >> +          if (seq < (last_seq - 0x80000000LL)) {
>> >>                    seq += 0x100000000LL;
>> >>            }
>> > Can you provide a bit more explanation for this change? In particular,
>> > how could the code previously detect a wraparound when there was none,
>> > and why is this the proper fix?
>>
>> Honestly I also don't really understand how this bug happened in the
>> first place.
>>
>> We extend the 32bit fences supported by hardware by testing if a
>> previously read fence value is smaller than the value we read now:
>>
>> >             if (seq < last_seq) {
>>
>> But the problem seems to be that on some systems we do get fence values
>> that are decreasing, e.g. instead of 5, 6, 7, 8 we get 5, 7, 6, 8 (or
>> maybe 5, 6, 0, 7, 8 because somebody accidentally overwrites the fence
>> value).
>
> Maybe some kind of race involving radeon_fence_write()?
>
>
>> It might be related to a hardware bug, or the algorithm is flawed in a
>> way I currently don't see. Anyway the old code we had wasn't so picky
>> about such problems and the patch just tries to make the current code as
>> robust as the old code was, which indeed seems to solve the problems we see.
>>
>> The wrap around detection still works (tested by setting the initial
>> fence value to 0xfffffff0 and letting it wrap around shortly after
>> start), so I think it we can safely commit this.
>
> Without knowing exactly what kind of hardware fence value pattern caused
> the problem, we can't be sure that the wraparound handling will work
> reliably, or that the values going backwards won't cause other problems.
> I think it would be good to get more real-world data on that.
>

As i said in my email this patch just postpone the issue to last_fence
>= 0x1 8000 0001 if fence value we read back is sometimes randomly 0.
If we received fence value out of order (which i highly doubt as old
code would have had same issue thought on smaller scale) then if fence
value 0x1 8000 0001 is received before fence value 0x1 8000 0000 we
are right back to all future fence considered as signalled (again this
will take month of uptime).

All this probably lead to questioning the usefulness of 64bits fence.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: make 64bit fences more robust
  2012-09-10 15:52       ` Jerome Glisse
@ 2012-09-10 16:07         ` Jerome Glisse
  2012-09-10 20:13           ` Dave Airlie
  2012-09-11 10:11           ` Christian König
  0 siblings, 2 replies; 13+ messages in thread
From: Jerome Glisse @ 2012-09-10 16:07 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: dri-devel

On Mon, Sep 10, 2012 at 11:52 AM, Jerome Glisse <j.glisse@gmail.com> wrote:
> On Mon, Sep 10, 2012 at 11:38 AM, Michel Dänzer <michel@daenzer.net> wrote:
>> On Mon, 2012-09-10 at 14:02 +0200, Christian König wrote:
>>> On 10.09.2012 13:12, Michel Dänzer wrote:
>>> > On Mon, 2012-09-10 at 11:13 +0200, Christian König wrote:
>>> >> Only increase the higher 32bits if we really detect a wrap around.
>>> >>
>>> >> Fixes:
>>> >> https://bugs.freedesktop.org/show_bug.cgi?id=54129
>>> >> https://bugs.freedesktop.org/show_bug.cgi?id=54662
>>> >>
>>> >> Possible fixes:
>>> >> https://bugzilla.redhat.com/show_bug.cgi?id=846505
>>> >> https://bugzilla.redhat.com/show_bug.cgi?id=845639
>>> >>
>>> >> Signed-off-by: Christian König <deathsimple@vodafone.de>
>>> >> Cc: stable@vger.kernel.org
>>> >> ---
>>> >>   drivers/gpu/drm/radeon/radeon_fence.c |    6 +++---
>>> >>   1 file changed, 3 insertions(+), 3 deletions(-)
>>> >>
>>> >> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
>>> >> index 7b737b9..4781e13 100644
>>> >> --- a/drivers/gpu/drm/radeon/radeon_fence.c
>>> >> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
>>> >> @@ -160,7 +160,7 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
>>> >>    do {
>>> >>            seq = radeon_fence_read(rdev, ring);
>>> >>            seq |= last_seq & 0xffffffff00000000LL;
>>> >> -          if (seq < last_seq) {
>>> >> +          if (seq < (last_seq - 0x80000000LL)) {
>>> >>                    seq += 0x100000000LL;
>>> >>            }
>>> > Can you provide a bit more explanation for this change? In particular,
>>> > how could the code previously detect a wraparound when there was none,
>>> > and why is this the proper fix?
>>>
>>> Honestly I also don't really understand how this bug happened in the
>>> first place.
>>>
>>> We extend the 32bit fences supported by hardware by testing if a
>>> previously read fence value is smaller than the value we read now:
>>>
>>> >             if (seq < last_seq) {
>>>
>>> But the problem seems to be that on some systems we do get fence values
>>> that are decreasing, e.g. instead of 5, 6, 7, 8 we get 5, 7, 6, 8 (or
>>> maybe 5, 6, 0, 7, 8 because somebody accidentally overwrites the fence
>>> value).
>>
>> Maybe some kind of race involving radeon_fence_write()?
>>
>>
>>> It might be related to a hardware bug, or the algorithm is flawed in a
>>> way I currently don't see. Anyway the old code we had wasn't so picky
>>> about such problems and the patch just tries to make the current code as
>>> robust as the old code was, which indeed seems to solve the problems we see.
>>>
>>> The wrap around detection still works (tested by setting the initial
>>> fence value to 0xfffffff0 and letting it wrap around shortly after
>>> start), so I think it we can safely commit this.
>>
>> Without knowing exactly what kind of hardware fence value pattern caused
>> the problem, we can't be sure that the wraparound handling will work
>> reliably, or that the values going backwards won't cause other problems.
>> I think it would be good to get more real-world data on that.
>>
>
> As i said in my email this patch just postpone the issue to last_fence
>>= 0x1 8000 0001 if fence value we read back is sometimes randomly 0.
> If we received fence value out of order (which i highly doubt as old
> code would have had same issue thought on smaller scale) then if fence
> value 0x1 8000 0001 is received before fence value 0x1 8000 0000 we
> are right back to all future fence considered as signalled (again this
> will take month of uptime).

Actually thinking back about it if fence are just received out of
order then this patch corner case is if we received 0x1 ffff ffff
after receiving 0x1 0000 0000, what will happen is that the 0x1 0000
0000 is the wrap over that will trigger upper 32bits to be incremented
so fence become 0x2 0000 0000 then we got 0xffff ffff which with |
become 0x2 ffff ffff then we get next fence value 0x0000 0001 and
again we increment upper 32bits so last seq become 0x3 0000 0001.

Again this will happen after month of uptime and all it does is
decrement the amount of uptime for which 64bit fence are fine ie at
worst we over increment by 0x2 0000 0000 instead of 0x1 0000 0000 on
wrap around.

Cheers,
Jerome

>
> All this probably lead to questioning the usefulness of 64bits fence.
>
> Cheers,
> Jerome

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: make 64bit fences more robust
  2012-09-10 16:07         ` Jerome Glisse
@ 2012-09-10 20:13           ` Dave Airlie
  2012-09-10 21:10             ` Jerome Glisse
  2012-09-11 10:11           ` Christian König
  1 sibling, 1 reply; 13+ messages in thread
From: Dave Airlie @ 2012-09-10 20:13 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: Michel Dänzer, dri-devel

>>>
>>>
>>>> It might be related to a hardware bug, or the algorithm is flawed in a
>>>> way I currently don't see. Anyway the old code we had wasn't so picky
>>>> about such problems and the patch just tries to make the current code as
>>>> robust as the old code was, which indeed seems to solve the problems we see.
>>>>
>>>> The wrap around detection still works (tested by setting the initial
>>>> fence value to 0xfffffff0 and letting it wrap around shortly after
>>>> start), so I think it we can safely commit this.

Can we start fences off so we wrap around after say 15-20 minutes?
that would ensure

a) its tested
b) we see failure in a lifetime.

Dave.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: make 64bit fences more robust
  2012-09-10 20:13           ` Dave Airlie
@ 2012-09-10 21:10             ` Jerome Glisse
  2012-09-10 21:11               ` Jerome Glisse
  0 siblings, 1 reply; 13+ messages in thread
From: Jerome Glisse @ 2012-09-10 21:10 UTC (permalink / raw)
  To: Dave Airlie; +Cc: Michel Dänzer, dri-devel

On Mon, Sep 10, 2012 at 4:13 PM, Dave Airlie <airlied@gmail.com> wrote:
>>>>
>>>>
>>>>> It might be related to a hardware bug, or the algorithm is flawed in a
>>>>> way I currently don't see. Anyway the old code we had wasn't so picky
>>>>> about such problems and the patch just tries to make the current code as
>>>>> robust as the old code was, which indeed seems to solve the problems we see.
>>>>>
>>>>> The wrap around detection still works (tested by setting the initial
>>>>> fence value to 0xfffffff0 and letting it wrap around shortly after
>>>>> start), so I think it we can safely commit this.
>
> Can we start fences off so we wrap around after say 15-20 minutes?
> that would ensure
>
> a) its tested
> b) we see failure in a lifetime.
>
> Dave.

IIRC normal desktop with continuous activities was around 400 fence/minutes.

Anyway it all depends on what is wrong. Do we sometime get a 0 as
fence value or do we get fence value in wrong order. Depending on that
the wrap around is a different issue see my previous mails.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: make 64bit fences more robust
  2012-09-10 21:10             ` Jerome Glisse
@ 2012-09-10 21:11               ` Jerome Glisse
  0 siblings, 0 replies; 13+ messages in thread
From: Jerome Glisse @ 2012-09-10 21:11 UTC (permalink / raw)
  To: Dave Airlie; +Cc: Michel Dänzer, dri-devel

On Mon, Sep 10, 2012 at 5:10 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> On Mon, Sep 10, 2012 at 4:13 PM, Dave Airlie <airlied@gmail.com> wrote:
>>>>>
>>>>>
>>>>>> It might be related to a hardware bug, or the algorithm is flawed in a
>>>>>> way I currently don't see. Anyway the old code we had wasn't so picky
>>>>>> about such problems and the patch just tries to make the current code as
>>>>>> robust as the old code was, which indeed seems to solve the problems we see.
>>>>>>
>>>>>> The wrap around detection still works (tested by setting the initial
>>>>>> fence value to 0xfffffff0 and letting it wrap around shortly after
>>>>>> start), so I think it we can safely commit this.
>>
>> Can we start fences off so we wrap around after say 15-20 minutes?
>> that would ensure
>>
>> a) its tested
>> b) we see failure in a lifetime.
>>
>> Dave.
>
> IIRC normal desktop with continuous activities was around 400 fence/minutes.
>
> Anyway it all depends on what is wrong. Do we sometime get a 0 as
> fence value or do we get fence value in wrong order. Depending on that
> the wrap around is a different issue see my previous mails.
>
> Cheers,
> Jerome

s/minutes/seconds

Jerome

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: make 64bit fences more robust
  2012-09-10 16:07         ` Jerome Glisse
  2012-09-10 20:13           ` Dave Airlie
@ 2012-09-11 10:11           ` Christian König
  2012-09-11 10:23             ` Michel Dänzer
  1 sibling, 1 reply; 13+ messages in thread
From: Christian König @ 2012-09-11 10:11 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: Michel Dänzer, dri-devel

[-- Attachment #1: Type: text/plain, Size: 5255 bytes --]

On 10.09.2012 18:07, Jerome Glisse wrote:
> On Mon, Sep 10, 2012 at 11:52 AM, Jerome Glisse <j.glisse@gmail.com> wrote:
>> On Mon, Sep 10, 2012 at 11:38 AM, Michel Dänzer <michel@daenzer.net> wrote:
>>> On Mon, 2012-09-10 at 14:02 +0200, Christian König wrote:
>>>> On 10.09.2012 13:12, Michel Dänzer wrote:
>>>>> On Mon, 2012-09-10 at 11:13 +0200, Christian König wrote:
>>>>>> Only increase the higher 32bits if we really detect a wrap around.
>>>>>>
>>>>>> Fixes:
>>>>>> https://bugs.freedesktop.org/show_bug.cgi?id=54129
>>>>>> https://bugs.freedesktop.org/show_bug.cgi?id=54662
>>>>>>
>>>>>> Possible fixes:
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=846505
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=845639
>>>>>>
>>>>>> Signed-off-by: Christian König <deathsimple@vodafone.de>
>>>>>> Cc: stable@vger.kernel.org
>>>>>> ---
>>>>>>    drivers/gpu/drm/radeon/radeon_fence.c |    6 +++---
>>>>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
>>>>>> index 7b737b9..4781e13 100644
>>>>>> --- a/drivers/gpu/drm/radeon/radeon_fence.c
>>>>>> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
>>>>>> @@ -160,7 +160,7 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
>>>>>>     do {
>>>>>>             seq = radeon_fence_read(rdev, ring);
>>>>>>             seq |= last_seq & 0xffffffff00000000LL;
>>>>>> -          if (seq < last_seq) {
>>>>>> +          if (seq < (last_seq - 0x80000000LL)) {
>>>>>>                     seq += 0x100000000LL;
>>>>>>             }
>>>>> Can you provide a bit more explanation for this change? In particular,
>>>>> how could the code previously detect a wraparound when there was none,
>>>>> and why is this the proper fix?
>>>> Honestly I also don't really understand how this bug happened in the
>>>> first place.
>>>>
>>>> We extend the 32bit fences supported by hardware by testing if a
>>>> previously read fence value is smaller than the value we read now:
>>>>
>>>>>              if (seq < last_seq) {
>>>> But the problem seems to be that on some systems we do get fence values
>>>> that are decreasing, e.g. instead of 5, 6, 7, 8 we get 5, 7, 6, 8 (or
>>>> maybe 5, 6, 0, 7, 8 because somebody accidentally overwrites the fence
>>>> value).
>>> Maybe some kind of race involving radeon_fence_write()?
>>>
>>>
>>>> It might be related to a hardware bug, or the algorithm is flawed in a
>>>> way I currently don't see. Anyway the old code we had wasn't so picky
>>>> about such problems and the patch just tries to make the current code as
>>>> robust as the old code was, which indeed seems to solve the problems we see.
>>>>
>>>> The wrap around detection still works (tested by setting the initial
>>>> fence value to 0xfffffff0 and letting it wrap around shortly after
>>>> start), so I think it we can safely commit this.
>>> Without knowing exactly what kind of hardware fence value pattern caused
>>> the problem, we can't be sure that the wraparound handling will work
>>> reliably, or that the values going backwards won't cause other problems.
>>> I think it would be good to get more real-world data on that.
>>>
>> As i said in my email this patch just postpone the issue to last_fence
>>> = 0x1 8000 0001 if fence value we read back is sometimes randomly 0.
>> If we received fence value out of order (which i highly doubt as old
>> code would have had same issue thought on smaller scale) then if fence
>> value 0x1 8000 0001 is received before fence value 0x1 8000 0000 we
>> are right back to all future fence considered as signalled (again this
>> will take month of uptime).
> Actually thinking back about it if fence are just received out of
> order then this patch corner case is if we received 0x1 ffff ffff
> after receiving 0x1 0000 0000, what will happen is that the 0x1 0000
> 0000 is the wrap over that will trigger upper 32bits to be incremented
> so fence become 0x2 0000 0000 then we got 0xffff ffff which with |
> become 0x2 ffff ffff then we get next fence value 0x0000 0001 and
> again we increment upper 32bits so last seq become 0x3 0000 0001.
Good point.

> Again this will happen after month of uptime and all it does is
> decrement the amount of uptime for which 64bit fence are fine ie at
> worst we over increment by 0x2 0000 0000 instead of 0x1 0000 0000 on
> wrap around.
How about this idea: Instead of increasing the upper 32bits we just use 
the upper 32bits of the last emitted fence value?
E.g. see the attached patch. That both should handle random zero and out 
of order values more gracefully.

Additionally I think that the reason we haven't had this before is that 
this corruption might only happens on hw (re-)initialisation, e.g. boot 
and resume.

Currently I'm hacking together a small test app that just emits an IB 
with some NOP instructions, if I'm not completely wrong that should 
gives us a very high fence rate, so we might be able to actually test 
the wrap around a bit more.

Christian.

> Cheers,
> Jerome
>
>> All this probably lead to questioning the usefulness of 64bits fence.
>>
>> Cheers,
>> Jerome


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-drm-radeon-make-64bit-fences-more-robust-v2.patch --]
[-- Type: text/x-patch; name="0001-drm-radeon-make-64bit-fences-more-robust-v2.patch", Size: 1406 bytes --]

>From 8737d17a45e04d7c111abb5e79e48577b224fae6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <deathsimple@vodafone.de>
Date: Sun, 9 Sep 2012 11:45:19 +0200
Subject: [PATCH] drm/radeon: make 64bit fences more robust v2
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Only increase the higher 32bits if we really detect a wrap around.

v2: instead of increasing the higher 32bits just use the higher
    32bits from the last emitted fence.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/radeon/radeon_fence.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index 7b737b9..a263513 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -161,10 +161,12 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
 		seq = radeon_fence_read(rdev, ring);
 		seq |= last_seq & 0xffffffff00000000LL;
 		if (seq < last_seq) {
-			seq += 0x100000000LL;
+			seq &= 0xffffffff;
+			seq |= rdev->fence_drv[ring].sync_seq[ring] &
+				0xffffffff00000000LL;
 		}
 
-		if (seq == last_seq) {
+		if (seq <= last_seq) {
 			break;
 		}
 		/* If we loop over we don't want to return without
-- 
1.7.9.5


[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: make 64bit fences more robust
  2012-09-11 10:11           ` Christian König
@ 2012-09-11 10:23             ` Michel Dänzer
  2012-09-11 16:03               ` Jerome Glisse
  0 siblings, 1 reply; 13+ messages in thread
From: Michel Dänzer @ 2012-09-11 10:23 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Die, 2012-09-11 at 12:11 +0200, Christian König wrote: 
> 
> How about this idea: Instead of increasing the upper 32bits we just use 
> the upper 32bits of the last emitted fence value?
> E.g. see the attached patch. That both should handle random zero and out 
> of order values more gracefully.

I like this idea.


-- 
Earthling Michel Dänzer           |                   http://www.amd.com
Libre software enthusiast         |          Debian, X and DRI developer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/radeon: make 64bit fences more robust
  2012-09-11 10:23             ` Michel Dänzer
@ 2012-09-11 16:03               ` Jerome Glisse
  0 siblings, 0 replies; 13+ messages in thread
From: Jerome Glisse @ 2012-09-11 16:03 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: dri-devel

On Tue, Sep 11, 2012 at 6:23 AM, Michel Dänzer <michel@daenzer.net> wrote:
> On Die, 2012-09-11 at 12:11 +0200, Christian König wrote:
>>
>> How about this idea: Instead of increasing the upper 32bits we just use
>> the upper 32bits of the last emitted fence value?
>> E.g. see the attached patch. That both should handle random zero and out
>> of order values more gracefully.
>
> I like this idea.
>

Yeah this patch is better.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-09-11 16:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-10  9:13 [PATCH] drm/radeon: make 64bit fences more robust Christian König
2012-09-10 11:12 ` Michel Dänzer
2012-09-10 12:02   ` Christian König
2012-09-10 15:32     ` Jerome Glisse
2012-09-10 15:38     ` Michel Dänzer
2012-09-10 15:52       ` Jerome Glisse
2012-09-10 16:07         ` Jerome Glisse
2012-09-10 20:13           ` Dave Airlie
2012-09-10 21:10             ` Jerome Glisse
2012-09-10 21:11               ` Jerome Glisse
2012-09-11 10:11           ` Christian König
2012-09-11 10:23             ` Michel Dänzer
2012-09-11 16:03               ` Jerome Glisse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.