dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] dma-buf: keep only not signaled fence in reservation_object_add_shared_replace v2
@ 2017-10-30 14:59 Christian König
  2017-10-30 14:59 ` [PATCH 2/2] dma-buf: try to replace a signaled fence in reservation_object_add_shared_inplace Christian König
       [not found] ` <20171030145904.18584-1-deathsimple-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
  0 siblings, 2 replies; 17+ messages in thread
From: Christian König @ 2017-10-30 14:59 UTC (permalink / raw)
  To: dri-devel, amd-gfx

From: Christian König <christian.koenig@amd.com>

The amdgpu issue to also need signaled fences in the reservation objects
should be fixed by now.

Optimize the list by keeping only the not signaled yet fences around.

v2: temporary put the signaled fences at the end of the new container

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/reservation.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index b44d9d7db347..6fc794576997 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -145,8 +145,8 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
 				      struct reservation_object_list *fobj,
 				      struct dma_fence *fence)
 {
-	unsigned i;
 	struct dma_fence *old_fence = NULL;
+	unsigned i, j, k;
 
 	dma_fence_get(fence);
 
@@ -162,9 +162,7 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
 	 * references from the old struct are carried over to
 	 * the new.
 	 */
-	fobj->shared_count = old->shared_count;
-
-	for (i = 0; i < old->shared_count; ++i) {
+	for (i = 0, j = 0, k = fobj->shared_max; i < old->shared_count; ++i) {
 		struct dma_fence *check;
 
 		check = rcu_dereference_protected(old->shared[i],
@@ -172,10 +170,18 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
 
 		if (!old_fence && check->context == fence->context) {
 			old_fence = check;
-			RCU_INIT_POINTER(fobj->shared[i], fence);
-		} else
-			RCU_INIT_POINTER(fobj->shared[i], check);
+			RCU_INIT_POINTER(fobj->shared[j++], fence);
+		} else if (!dma_fence_is_signaled(check)) {
+			RCU_INIT_POINTER(fobj->shared[j++], check);
+		} else {
+			/*
+			 * Temporary save the signaled fences at the end of the
+			 * new container
+			 */
+			RCU_INIT_POINTER(fobj->shared[--k], check);
+		}
 	}
+	fobj->shared_count = j;
 	if (!old_fence) {
 		RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
 		fobj->shared_count++;
@@ -192,10 +198,20 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
 	write_seqcount_end(&obj->seq);
 	preempt_enable();
 
-	if (old)
-		kfree_rcu(old, rcu);
-
 	dma_fence_put(old_fence);
+
+	if (!old)
+		return;
+
+	/* Drop the references to the signaled fences */
+	for (i = k; i < fobj->shared_max; ++i) {
+		struct dma_fence *f;
+
+		f = rcu_dereference_protected(fobj->shared[i],
+					      reservation_object_held(obj));
+		dma_fence_put(f);
+	}
+	kfree_rcu(old, rcu);
 }
 
 /**
-- 
2.11.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] dma-buf: try to replace a signaled fence in reservation_object_add_shared_inplace
  2017-10-30 14:59 [PATCH 1/2] dma-buf: keep only not signaled fence in reservation_object_add_shared_replace v2 Christian König
@ 2017-10-30 14:59 ` Christian König
  2017-11-06 16:22   ` Chris Wilson
       [not found] ` <20171030145904.18584-1-deathsimple-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
  1 sibling, 1 reply; 17+ messages in thread
From: Christian König @ 2017-10-30 14:59 UTC (permalink / raw)
  To: dri-devel, amd-gfx

From: Christian König <christian.koenig@amd.com>

The amdgpu issue to also need signaled fences in the reservation objects should
be fixed by now.

Optimize the handling by replacing a signaled fence when adding a new
shared one.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/reservation.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 6fc794576997..a3928ce9f311 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -104,7 +104,8 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
 				      struct reservation_object_list *fobj,
 				      struct dma_fence *fence)
 {
-	u32 i;
+	struct dma_fence *signaled = NULL;
+	u32 i, signaled_idx;
 
 	dma_fence_get(fence);
 
@@ -126,17 +127,28 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
 			dma_fence_put(old_fence);
 			return;
 		}
+
+		if (!signaled && dma_fence_is_signaled(old_fence)) {
+			signaled = old_fence;
+			signaled_idx = i;
+		}
 	}
 
 	/*
 	 * memory barrier is added by write_seqcount_begin,
 	 * fobj->shared_count is protected by this lock too
 	 */
-	RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
-	fobj->shared_count++;
+	if (signaled) {
+		RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
+	} else {
+		RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
+		fobj->shared_count++;
+	}
 
 	write_seqcount_end(&obj->seq);
 	preempt_enable();
+
+	dma_fence_put(signaled);
 }
 
 static void
-- 
2.11.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 1/2] dma-buf: keep only not signaled fence in reservation_object_add_shared_replace v2
@ 2017-10-31  8:41 Christian König
  0 siblings, 0 replies; 17+ messages in thread
From: Christian König @ 2017-10-31  8:41 UTC (permalink / raw)
  To: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	sumit.semwal-QSEj5FYQhm4dnm+yROfE0A, christian.koenig-5C7GfCeVMHo

From: Christian König <christian.koenig@amd.com>

The amdgpu issue to also need signaled fences in the reservation objects
should be fixed by now.

Optimize the list by keeping only the not signaled yet fences around.

v2: temporary put the signaled fences at the end of the new container

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/reservation.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index b44d9d7db347..6fc794576997 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -145,8 +145,8 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
 				      struct reservation_object_list *fobj,
 				      struct dma_fence *fence)
 {
-	unsigned i;
 	struct dma_fence *old_fence = NULL;
+	unsigned i, j, k;
 
 	dma_fence_get(fence);
 
@@ -162,9 +162,7 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
 	 * references from the old struct are carried over to
 	 * the new.
 	 */
-	fobj->shared_count = old->shared_count;
-
-	for (i = 0; i < old->shared_count; ++i) {
+	for (i = 0, j = 0, k = fobj->shared_max; i < old->shared_count; ++i) {
 		struct dma_fence *check;
 
 		check = rcu_dereference_protected(old->shared[i],
@@ -172,10 +170,18 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
 
 		if (!old_fence && check->context == fence->context) {
 			old_fence = check;
-			RCU_INIT_POINTER(fobj->shared[i], fence);
-		} else
-			RCU_INIT_POINTER(fobj->shared[i], check);
+			RCU_INIT_POINTER(fobj->shared[j++], fence);
+		} else if (!dma_fence_is_signaled(check)) {
+			RCU_INIT_POINTER(fobj->shared[j++], check);
+		} else {
+			/*
+			 * Temporary save the signaled fences at the end of the
+			 * new container
+			 */
+			RCU_INIT_POINTER(fobj->shared[--k], check);
+		}
 	}
+	fobj->shared_count = j;
 	if (!old_fence) {
 		RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
 		fobj->shared_count++;
@@ -192,10 +198,20 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
 	write_seqcount_end(&obj->seq);
 	preempt_enable();
 
-	if (old)
-		kfree_rcu(old, rcu);
-
 	dma_fence_put(old_fence);
+
+	if (!old)
+		return;
+
+	/* Drop the references to the signaled fences */
+	for (i = k; i < fobj->shared_max; ++i) {
+		struct dma_fence *f;
+
+		f = rcu_dereference_protected(fobj->shared[i],
+					      reservation_object_held(obj));
+		dma_fence_put(f);
+	}
+	kfree_rcu(old, rcu);
 }
 
 /**
-- 
2.11.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 1/2] dma-buf: keep only not signaled fence in reservation_object_add_shared_replace v2
@ 2017-10-31  8:43 Christian König
       [not found] ` <20171031084306.1986-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Christian König @ 2017-10-31  8:43 UTC (permalink / raw)
  To: dri-devel, amd-gfx, sumit.semwal, christian.koenig

The amdgpu issue to also need signaled fences in the reservation objects
should be fixed by now.

Optimize the list by keeping only the not signaled yet fences around.

v2: temporary put the signaled fences at the end of the new container

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/reservation.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index b44d9d7db347..6fc794576997 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -145,8 +145,8 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
 				      struct reservation_object_list *fobj,
 				      struct dma_fence *fence)
 {
-	unsigned i;
 	struct dma_fence *old_fence = NULL;
+	unsigned i, j, k;
 
 	dma_fence_get(fence);
 
@@ -162,9 +162,7 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
 	 * references from the old struct are carried over to
 	 * the new.
 	 */
-	fobj->shared_count = old->shared_count;
-
-	for (i = 0; i < old->shared_count; ++i) {
+	for (i = 0, j = 0, k = fobj->shared_max; i < old->shared_count; ++i) {
 		struct dma_fence *check;
 
 		check = rcu_dereference_protected(old->shared[i],
@@ -172,10 +170,18 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
 
 		if (!old_fence && check->context == fence->context) {
 			old_fence = check;
-			RCU_INIT_POINTER(fobj->shared[i], fence);
-		} else
-			RCU_INIT_POINTER(fobj->shared[i], check);
+			RCU_INIT_POINTER(fobj->shared[j++], fence);
+		} else if (!dma_fence_is_signaled(check)) {
+			RCU_INIT_POINTER(fobj->shared[j++], check);
+		} else {
+			/*
+			 * Temporary save the signaled fences at the end of the
+			 * new container
+			 */
+			RCU_INIT_POINTER(fobj->shared[--k], check);
+		}
 	}
+	fobj->shared_count = j;
 	if (!old_fence) {
 		RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
 		fobj->shared_count++;
@@ -192,10 +198,20 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
 	write_seqcount_end(&obj->seq);
 	preempt_enable();
 
-	if (old)
-		kfree_rcu(old, rcu);
-
 	dma_fence_put(old_fence);
+
+	if (!old)
+		return;
+
+	/* Drop the references to the signaled fences */
+	for (i = k; i < fobj->shared_max; ++i) {
+		struct dma_fence *f;
+
+		f = rcu_dereference_protected(fobj->shared[i],
+					      reservation_object_held(obj));
+		dma_fence_put(f);
+	}
+	kfree_rcu(old, rcu);
 }
 
 /**
-- 
2.11.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] dma-buf: keep only not signaled fence in reservation_object_add_shared_replace v2
       [not found] ` <20171031084306.1986-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
@ 2017-10-31  8:51   ` Chunming Zhou
  2017-10-31  8:55     ` Christian König
  0 siblings, 1 reply; 17+ messages in thread
From: Chunming Zhou @ 2017-10-31  8:51 UTC (permalink / raw)
  To: Christian König, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	sumit.semwal-QSEj5FYQhm4dnm+yROfE0A, christian.koenig-5C7GfCeVMHo



On 2017年10月31日 16:43, Christian König wrote:
> The amdgpu issue to also need signaled fences in the reservation objects
> should be fixed by now.
>
> Optimize the list by keeping only the not signaled yet fences around.
>
> v2: temporary put the signaled fences at the end of the new container
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/dma-buf/reservation.c | 36 ++++++++++++++++++++++++++----------
>   1 file changed, 26 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index b44d9d7db347..6fc794576997 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -145,8 +145,8 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
>   				      struct reservation_object_list *fobj,
>   				      struct dma_fence *fence)
>   {
> -	unsigned i;
>   	struct dma_fence *old_fence = NULL;
> +	unsigned i, j, k;
>   
>   	dma_fence_get(fence);
>   
> @@ -162,9 +162,7 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
>   	 * references from the old struct are carried over to
>   	 * the new.
>   	 */
> -	fobj->shared_count = old->shared_count;
> -
> -	for (i = 0; i < old->shared_count; ++i) {
> +	for (i = 0, j = 0, k = fobj->shared_max; i < old->shared_count; ++i) {
>   		struct dma_fence *check;
>   
>   		check = rcu_dereference_protected(old->shared[i],
> @@ -172,10 +170,18 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
>   
>   		if (!old_fence && check->context == fence->context) {
>   			old_fence = check;
> -			RCU_INIT_POINTER(fobj->shared[i], fence);
> -		} else
> -			RCU_INIT_POINTER(fobj->shared[i], check);
> +			RCU_INIT_POINTER(fobj->shared[j++], fence);
> +		} else if (!dma_fence_is_signaled(check)) {
> +			RCU_INIT_POINTER(fobj->shared[j++], check);
> +		} else {
> +			/*
> +			 * Temporary save the signaled fences at the end of the
> +			 * new container
> +			 */
> +			RCU_INIT_POINTER(fobj->shared[--k], check);
> +		}
>   	}
> +	fobj->shared_count = j;
>   	if (!old_fence) {
>   		RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
Seems you still don't address here, how do you sure 
fobj->shared[fobj->shared_count] is null? if not NULL, the old one will 
be leak.

Regards,
David Zhou
>   		fobj->shared_count++;
> @@ -192,10 +198,20 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
>   	write_seqcount_end(&obj->seq);
>   	preempt_enable();
>   
> -	if (old)
> -		kfree_rcu(old, rcu);
> -
>   	dma_fence_put(old_fence);
> +
> +	if (!old)
> +		return;
> +
> +	/* Drop the references to the signaled fences */
> +	for (i = k; i < fobj->shared_max; ++i) {
> +		struct dma_fence *f;
> +
> +		f = rcu_dereference_protected(fobj->shared[i],
> +					      reservation_object_held(obj));
> +		dma_fence_put(f);
> +	}
> +	kfree_rcu(old, rcu);
>   }
>   
>   /**

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] dma-buf: keep only not signaled fence in reservation_object_add_shared_replace v2
  2017-10-31  8:51   ` Chunming Zhou
@ 2017-10-31  8:55     ` Christian König
       [not found]       ` <bb7fb308-8e1d-96c1-6474-c1bd35fd4601-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Christian König @ 2017-10-31  8:55 UTC (permalink / raw)
  To: Chunming Zhou, dri-devel, amd-gfx, sumit.semwal, christian.koenig

Am 31.10.2017 um 09:51 schrieb Chunming Zhou:
>
>
> On 2017年10月31日 16:43, Christian König wrote:
>> The amdgpu issue to also need signaled fences in the reservation objects
>> should be fixed by now.
>>
>> Optimize the list by keeping only the not signaled yet fences around.
>>
>> v2: temporary put the signaled fences at the end of the new container
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/dma-buf/reservation.c | 36 
>> ++++++++++++++++++++++++++----------
>>   1 file changed, 26 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/dma-buf/reservation.c 
>> b/drivers/dma-buf/reservation.c
>> index b44d9d7db347..6fc794576997 100644
>> --- a/drivers/dma-buf/reservation.c
>> +++ b/drivers/dma-buf/reservation.c
>> @@ -145,8 +145,8 @@ reservation_object_add_shared_replace(struct 
>> reservation_object *obj,
>>                         struct reservation_object_list *fobj,
>>                         struct dma_fence *fence)
>>   {
>> -    unsigned i;
>>       struct dma_fence *old_fence = NULL;
>> +    unsigned i, j, k;
>>         dma_fence_get(fence);
>>   @@ -162,9 +162,7 @@ reservation_object_add_shared_replace(struct 
>> reservation_object *obj,
>>        * references from the old struct are carried over to
>>        * the new.
>>        */
>> -    fobj->shared_count = old->shared_count;
>> -
>> -    for (i = 0; i < old->shared_count; ++i) {
>> +    for (i = 0, j = 0, k = fobj->shared_max; i < old->shared_count; 
>> ++i) {
>>           struct dma_fence *check;
>>             check = rcu_dereference_protected(old->shared[i],
>> @@ -172,10 +170,18 @@ reservation_object_add_shared_replace(struct 
>> reservation_object *obj,
>>             if (!old_fence && check->context == fence->context) {
>>               old_fence = check;
>> -            RCU_INIT_POINTER(fobj->shared[i], fence);
>> -        } else
>> -            RCU_INIT_POINTER(fobj->shared[i], check);
>> +            RCU_INIT_POINTER(fobj->shared[j++], fence);
>> +        } else if (!dma_fence_is_signaled(check)) {
>> +            RCU_INIT_POINTER(fobj->shared[j++], check);
>> +        } else {
>> +            /*
>> +             * Temporary save the signaled fences at the end of the
>> +             * new container
>> +             */
>> +            RCU_INIT_POINTER(fobj->shared[--k], check);
>> +        }
>>       }
>> +    fobj->shared_count = j;
>>       if (!old_fence) {
>> RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
> Seems you still don't address here, how do you sure 
> fobj->shared[fobj->shared_count] is null? if not NULL, the old one 
> will be leak.

I've put them at the end of the container, see above "k = 
fobj->shared_max". Since shared_max >> shared_count (at least twice as 
large in this situation) we should be on the save side.

Regards,
Christian.

>
> Regards,
> David Zhou
>>           fobj->shared_count++;
>> @@ -192,10 +198,20 @@ reservation_object_add_shared_replace(struct 
>> reservation_object *obj,
>>       write_seqcount_end(&obj->seq);
>>       preempt_enable();
>>   -    if (old)
>> -        kfree_rcu(old, rcu);
>> -
>>       dma_fence_put(old_fence);
>> +
>> +    if (!old)
>> +        return;
>> +
>> +    /* Drop the references to the signaled fences */
>> +    for (i = k; i < fobj->shared_max; ++i) {
>> +        struct dma_fence *f;
>> +
>> +        f = rcu_dereference_protected(fobj->shared[i],
>> +                          reservation_object_held(obj));
>> +        dma_fence_put(f);
>> +    }
>> +    kfree_rcu(old, rcu);
>>   }
>>     /**
>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] dma-buf: keep only not signaled fence in reservation_object_add_shared_replace v2
       [not found]       ` <bb7fb308-8e1d-96c1-6474-c1bd35fd4601-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-10-31  9:10         ` Chunming Zhou
  0 siblings, 0 replies; 17+ messages in thread
From: Chunming Zhou @ 2017-10-31  9:10 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	sumit.semwal-QSEj5FYQhm4dnm+yROfE0A



On 2017年10月31日 16:55, Christian König wrote:
> Am 31.10.2017 um 09:51 schrieb Chunming Zhou:
>>
>>
>> On 2017年10月31日 16:43, Christian König wrote:
>>> The amdgpu issue to also need signaled fences in the reservation 
>>> objects
>>> should be fixed by now.
>>>
>>> Optimize the list by keeping only the not signaled yet fences around.
>>>
>>> v2: temporary put the signaled fences at the end of the new container
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> ---
>>>   drivers/dma-buf/reservation.c | 36 
>>> ++++++++++++++++++++++++++----------
>>>   1 file changed, 26 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/dma-buf/reservation.c 
>>> b/drivers/dma-buf/reservation.c
>>> index b44d9d7db347..6fc794576997 100644
>>> --- a/drivers/dma-buf/reservation.c
>>> +++ b/drivers/dma-buf/reservation.c
>>> @@ -145,8 +145,8 @@ reservation_object_add_shared_replace(struct 
>>> reservation_object *obj,
>>>                         struct reservation_object_list *fobj,
>>>                         struct dma_fence *fence)
>>>   {
>>> -    unsigned i;
>>>       struct dma_fence *old_fence = NULL;
>>> +    unsigned i, j, k;
>>>         dma_fence_get(fence);
>>>   @@ -162,9 +162,7 @@ reservation_object_add_shared_replace(struct 
>>> reservation_object *obj,
>>>        * references from the old struct are carried over to
>>>        * the new.
>>>        */
>>> -    fobj->shared_count = old->shared_count;
>>> -
>>> -    for (i = 0; i < old->shared_count; ++i) {
>>> +    for (i = 0, j = 0, k = fobj->shared_max; i < old->shared_count; 
>>> ++i) {
>>>           struct dma_fence *check;
>>>             check = rcu_dereference_protected(old->shared[i],
>>> @@ -172,10 +170,18 @@ reservation_object_add_shared_replace(struct 
>>> reservation_object *obj,
>>>             if (!old_fence && check->context == fence->context) {
>>>               old_fence = check;
>>> -            RCU_INIT_POINTER(fobj->shared[i], fence);
>>> -        } else
>>> -            RCU_INIT_POINTER(fobj->shared[i], check);
>>> +            RCU_INIT_POINTER(fobj->shared[j++], fence);
>>> +        } else if (!dma_fence_is_signaled(check)) {
>>> +            RCU_INIT_POINTER(fobj->shared[j++], check);
>>> +        } else {
>>> +            /*
>>> +             * Temporary save the signaled fences at the end of the
>>> +             * new container
>>> +             */
>>> +            RCU_INIT_POINTER(fobj->shared[--k], check);
>>> +        }
>>>       }
>>> +    fobj->shared_count = j;
>>>       if (!old_fence) {
>>> RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
>> Seems you still don't address here, how do you sure 
>> fobj->shared[fobj->shared_count] is null? if not NULL, the old one 
>> will be leak.
>
> I've put them at the end of the container, see above "k = 
> fobj->shared_max". Since shared_max >> shared_count (at least twice as 
> large in this situation) we should be on the save side.
Ah, oops, Reviewed-by: Chunming Zhou <david1.zhou@amd.com> for the series.

Thanks,
David Zhou
>
> Regards,
> Christian.
>
>>
>> Regards,
>> David Zhou
>>>           fobj->shared_count++;
>>> @@ -192,10 +198,20 @@ reservation_object_add_shared_replace(struct 
>>> reservation_object *obj,
>>>       write_seqcount_end(&obj->seq);
>>>       preempt_enable();
>>>   -    if (old)
>>> -        kfree_rcu(old, rcu);
>>> -
>>>       dma_fence_put(old_fence);
>>> +
>>> +    if (!old)
>>> +        return;
>>> +
>>> +    /* Drop the references to the signaled fences */
>>> +    for (i = k; i < fobj->shared_max; ++i) {
>>> +        struct dma_fence *f;
>>> +
>>> +        f = rcu_dereference_protected(fobj->shared[i],
>>> +                          reservation_object_held(obj));
>>> +        dma_fence_put(f);
>>> +    }
>>> +    kfree_rcu(old, rcu);
>>>   }
>>>     /**
>>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] dma-buf: keep only not signaled fence in reservation_object_add_shared_replace v2
       [not found] ` <20171030145904.18584-1-deathsimple-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
@ 2017-11-02 12:26   ` Sumit Semwal
  2017-11-06 16:18   ` Chris Wilson
  1 sibling, 0 replies; 17+ messages in thread
From: Sumit Semwal @ 2017-11-02 12:26 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx list, DRI mailing list

Hi Christian,

On 30 October 2017 at 20:29, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
> From: Christian König <christian.koenig@amd.com>
>
> The amdgpu issue to also need signaled fences in the reservation objects
> should be fixed by now.
>
> Optimize the list by keeping only the not signaled yet fences around.

I think this warrants some more context in the commit log. Could you
please do the same for the other patch as well?
>
> v2: temporary put the signaled fences at the end of the new container
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/reservation.c | 36 ++++++++++++++++++++++++++----------
>  1 file changed, 26 insertions(+), 10 deletions(-)
>

Best,
Sumit.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] dma-buf: keep only not signaled fence in reservation_object_add_shared_replace v2
       [not found] ` <20171030145904.18584-1-deathsimple-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
  2017-11-02 12:26   ` [PATCH 1/2] dma-buf: keep only not signaled fence in reservation_object_add_shared_replace v2 Sumit Semwal
@ 2017-11-06 16:18   ` Chris Wilson
  1 sibling, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2017-11-06 16:18 UTC (permalink / raw)
  To: Christian König, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Quoting Christian König (2017-10-30 14:59:03)
> From: Christian König <christian.koenig@amd.com>
> 
> The amdgpu issue to also need signaled fences in the reservation objects
> should be fixed by now.
> 
> Optimize the list by keeping only the not signaled yet fences around.
> 
> v2: temporary put the signaled fences at the end of the new container
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/reservation.c | 36 ++++++++++++++++++++++++++----------
>  1 file changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index b44d9d7db347..6fc794576997 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -145,8 +145,8 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
>                                       struct reservation_object_list *fobj,
>                                       struct dma_fence *fence)
>  {
> -       unsigned i;
>         struct dma_fence *old_fence = NULL;
> +       unsigned i, j, k;
>  
>         dma_fence_get(fence);
>  
> @@ -162,9 +162,7 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
>          * references from the old struct are carried over to
>          * the new.
>          */
> -       fobj->shared_count = old->shared_count;
> -
> -       for (i = 0; i < old->shared_count; ++i) {
> +       for (i = 0, j = 0, k = fobj->shared_max; i < old->shared_count; ++i) {
>                 struct dma_fence *check;
>  
>                 check = rcu_dereference_protected(old->shared[i],
> @@ -172,10 +170,18 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
>  
>                 if (!old_fence && check->context == fence->context) {
>                         old_fence = check;
> -                       RCU_INIT_POINTER(fobj->shared[i], fence);
> -               } else
> -                       RCU_INIT_POINTER(fobj->shared[i], check);
> +                       RCU_INIT_POINTER(fobj->shared[j++], fence);
> +               } else if (!dma_fence_is_signaled(check)) {
> +                       RCU_INIT_POINTER(fobj->shared[j++], check);
> +               } else {
> +                       /*
> +                        * Temporary save the signaled fences at the end of the
> +                        * new container
> +                        */
> +                       RCU_INIT_POINTER(fobj->shared[--k], check);
> +               }
>         }
> +       fobj->shared_count = j;
>         if (!old_fence) {
>                 RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
>                 fobj->shared_count++;
> @@ -192,10 +198,20 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
>         write_seqcount_end(&obj->seq);
>         preempt_enable();
>  
> -       if (old)
> -               kfree_rcu(old, rcu);
> -
>         dma_fence_put(old_fence);

We don't need to keep old_fence (local var) around any more in this
scheme. By construction, the new shared_list is large enough to hold all
the old_fences+1, so you can use a loop like

j = 0;
k = fobj->shared_max;
RCU_INIT_POINTER(fobj->shared[j++], fence);
for (i = 0; i < old->shared_count; ++i) {
	struct dma_fence *check;

	check = rcu_dereference_protected(old->shared[i],
					  reservation_object_held(obj));

	if (check->context == fence->context ||
	    dma_fence_is_signaled(check))
		RCU_INIT_POINTER(fobj->shared[--k], check);
	else
		RCU_INIT_POINTER(fobj->shared[j++], check);
}
fobj->shared_count = j;

> +
> +       if (!old)
> +               return;
> +
> +       /* Drop the references to the signaled fences */
> +       for (i = k; i < fobj->shared_max; ++i) {
> +               struct dma_fence *f;
> +
> +               f = rcu_dereference_protected(fobj->shared[i],
> +                                             reservation_object_held(obj));
> +               dma_fence_put(f);
> +       }
> +       kfree_rcu(old, rcu);
>  }

Fwiw, this fixes a big problem with the lack of pruning of the
reservation_object, so \o/
-Chris
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] dma-buf: try to replace a signaled fence in reservation_object_add_shared_inplace
  2017-10-30 14:59 ` [PATCH 2/2] dma-buf: try to replace a signaled fence in reservation_object_add_shared_inplace Christian König
@ 2017-11-06 16:22   ` Chris Wilson
       [not found]     ` <150998532204.10527.4781311190552781355-M6iVdVfohj6unts5RBS2dVaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2017-11-06 16:22 UTC (permalink / raw)
  To: Christian König, dri-devel, amd-gfx

Quoting Christian König (2017-10-30 14:59:04)
> From: Christian König <christian.koenig@amd.com>
> 
> The amdgpu issue to also need signaled fences in the reservation objects should
> be fixed by now.
> 
> Optimize the handling by replacing a signaled fence when adding a new
> shared one.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/reservation.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 6fc794576997..a3928ce9f311 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -104,7 +104,8 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>                                       struct reservation_object_list *fobj,
>                                       struct dma_fence *fence)
>  {
> -       u32 i;
> +       struct dma_fence *signaled = NULL;
> +       u32 i, signaled_idx;
>  
>         dma_fence_get(fence);
>  
> @@ -126,17 +127,28 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>                         dma_fence_put(old_fence);
>                         return;
>                 }
> +
> +               if (!signaled && dma_fence_is_signaled(old_fence)) {
> +                       signaled = old_fence;
> +                       signaled_idx = i;
> +               }

How much do we care about only keeping one fence per-ctx here? You could
rearrange this to break on old_fence->context == fence->context ||
dma_fence_is_signaled(old_fence) then everyone can use the final block.
-Chris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] dma-buf: try to replace a signaled fence in reservation_object_add_shared_inplace
       [not found]     ` <150998532204.10527.4781311190552781355-M6iVdVfohj6unts5RBS2dVaTQe2KTcn/@public.gmane.org>
@ 2017-11-14 14:24       ` Christian König
       [not found]         ` <4fc05061-e92d-1702-a0f6-ee337b7d3eb3-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Christian König @ 2017-11-14 14:24 UTC (permalink / raw)
  To: Chris Wilson, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 06.11.2017 um 17:22 schrieb Chris Wilson:
> Quoting Christian König (2017-10-30 14:59:04)
>> From: Christian König <christian.koenig@amd.com>
>>
>> The amdgpu issue to also need signaled fences in the reservation objects should
>> be fixed by now.
>>
>> Optimize the handling by replacing a signaled fence when adding a new
>> shared one.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/dma-buf/reservation.c | 18 +++++++++++++++---
>>   1 file changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
>> index 6fc794576997..a3928ce9f311 100644
>> --- a/drivers/dma-buf/reservation.c
>> +++ b/drivers/dma-buf/reservation.c
>> @@ -104,7 +104,8 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>>                                        struct reservation_object_list *fobj,
>>                                        struct dma_fence *fence)
>>   {
>> -       u32 i;
>> +       struct dma_fence *signaled = NULL;
>> +       u32 i, signaled_idx;
>>   
>>          dma_fence_get(fence);
>>   
>> @@ -126,17 +127,28 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>>                          dma_fence_put(old_fence);
>>                          return;
>>                  }
>> +
>> +               if (!signaled && dma_fence_is_signaled(old_fence)) {
>> +                       signaled = old_fence;
>> +                       signaled_idx = i;
>> +               }
> How much do we care about only keeping one fence per-ctx here? You could
> rearrange this to break on old_fence->context == fence->context ||
> dma_fence_is_signaled(old_fence) then everyone can use the final block.

Yeah, that is what David Zhou suggested as well.

I've rejected this approach for now cause I think we still have cases 
where we rely on one fence per ctx (but I'm not 100% sure).

I changed patch #1 in this series as you suggest and going to send that 
out once more in a minute.

Can we get this upstream as is for now? I won't have much more time 
working on this.

Regards,
Christian.

> -Chris


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] dma-buf: try to replace a signaled fence in reservation_object_add_shared_inplace
       [not found]         ` <4fc05061-e92d-1702-a0f6-ee337b7d3eb3-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-11-14 14:34           ` Chris Wilson
  2017-11-15 16:55             ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2017-11-14 14:34 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo, Christian König,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Quoting Christian König (2017-11-14 14:24:44)
> Am 06.11.2017 um 17:22 schrieb Chris Wilson:
> > Quoting Christian König (2017-10-30 14:59:04)
> >> From: Christian König <christian.koenig@amd.com>
> >>
> >> The amdgpu issue to also need signaled fences in the reservation objects should
> >> be fixed by now.
> >>
> >> Optimize the handling by replacing a signaled fence when adding a new
> >> shared one.
> >>
> >> Signed-off-by: Christian König <christian.koenig@amd.com>
> >> ---
> >>   drivers/dma-buf/reservation.c | 18 +++++++++++++++---
> >>   1 file changed, 15 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> >> index 6fc794576997..a3928ce9f311 100644
> >> --- a/drivers/dma-buf/reservation.c
> >> +++ b/drivers/dma-buf/reservation.c
> >> @@ -104,7 +104,8 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
> >>                                        struct reservation_object_list *fobj,
> >>                                        struct dma_fence *fence)
> >>   {
> >> -       u32 i;
> >> +       struct dma_fence *signaled = NULL;
> >> +       u32 i, signaled_idx;
> >>   
> >>          dma_fence_get(fence);
> >>   
> >> @@ -126,17 +127,28 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
> >>                          dma_fence_put(old_fence);
> >>                          return;
> >>                  }
> >> +
> >> +               if (!signaled && dma_fence_is_signaled(old_fence)) {
> >> +                       signaled = old_fence;
> >> +                       signaled_idx = i;
> >> +               }
> > How much do we care about only keeping one fence per-ctx here? You could
> > rearrange this to break on old_fence->context == fence->context ||
> > dma_fence_is_signaled(old_fence) then everyone can use the final block.
> 
> Yeah, that is what David Zhou suggested as well.
> 
> I've rejected this approach for now cause I think we still have cases 
> where we rely on one fence per ctx (but I'm not 100% sure).
> 
> I changed patch #1 in this series as you suggest and going to send that 
> out once more in a minute.
> 
> Can we get this upstream as is for now? I won't have much more time 
> working on this.

Sure, we are only discussing how we might make it look tidier, pure
micro-optimisation with the caveat of losing the one-fence-per-ctx
guarantee.
-Chris
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] dma-buf: try to replace a signaled fence in reservation_object_add_shared_inplace
  2017-11-14 14:34           ` Chris Wilson
@ 2017-11-15 16:55             ` Chris Wilson
  2017-11-15 17:34               ` Christian König
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2017-11-15 16:55 UTC (permalink / raw)
  To: christian.koenig, Christian König, dri-devel, amd-gfx

Quoting Chris Wilson (2017-11-14 14:34:05)
> Quoting Christian König (2017-11-14 14:24:44)
> > Am 06.11.2017 um 17:22 schrieb Chris Wilson:
> > > Quoting Christian König (2017-10-30 14:59:04)
> > >> @@ -126,17 +127,28 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
> > >>                          dma_fence_put(old_fence);
> > >>                          return;
> > >>                  }
> > >> +
> > >> +               if (!signaled && dma_fence_is_signaled(old_fence)) {
> > >> +                       signaled = old_fence;
> > >> +                       signaled_idx = i;
> > >> +               }
> > > How much do we care about only keeping one fence per-ctx here? You could
> > > rearrange this to break on old_fence->context == fence->context ||
> > > dma_fence_is_signaled(old_fence) then everyone can use the final block.
> > 
> > Yeah, that is what David Zhou suggested as well.
> > 
> > I've rejected this approach for now cause I think we still have cases 
> > where we rely on one fence per ctx (but I'm not 100% sure).
> > 
> > I changed patch #1 in this series as you suggest and going to send that 
> > out once more in a minute.
> > 
> > Can we get this upstream as is for now? I won't have much more time 
> > working on this.
> 
> Sure, we are only discussing how we might make it look tidier, pure
> micro-optimisation with the caveat of losing the one-fence-per-ctx
> guarantee.

Ah, one thing to note is that extra checking pushed one of our corner
case tests over its time limit.

If we can completely forgo the one-fence-per-ctx here, what works really
well in testing is

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 5319ac478918..5755e95fab1b 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -104,39 +104,19 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
                                      struct reservation_object_list *fobj,
                                      struct dma_fence *fence)
 {
-       struct dma_fence *replace = NULL;
-       u32 ctx = fence->context;
-       u32 i;
-
        dma_fence_get(fence);
 
        preempt_disable();
        write_seqcount_begin(&obj->seq);
 
-       for (i = 0; i < fobj->shared_count; ++i) {
-               struct dma_fence *check;
-
-               check = rcu_dereference_protected(fobj->shared[i],
-                                                 reservation_object_held(obj));
-
-               if (check->context == ctx || dma_fence_is_signaled(check)) {
-                       replace = old_fence;
-                       break;
-               }
-       }
-
        /*
         * memory barrier is added by write_seqcount_begin,
         * fobj->shared_count is protected by this lock too
         */
-       RCU_INIT_POINTER(fobj->shared[i], fence);
-       if (!replace)
-               fobj->shared_count++;
+       RCU_INIT_POINTER(fobj->shared[fobj->shared_count++], fence);
 
        write_seqcount_end(&obj->seq);
        preempt_enable();
-
-       dma_fence_put(replace);
 }
 
 static void

 i.e. don't check when not replacing the shared[], on creating the new
 buffer we then discard all the old fences.

It should work for amdgpu as well since you do a ht to coalesce
redundant fences before queuing.
-Chris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] dma-buf: try to replace a signaled fence in reservation_object_add_shared_inplace
  2017-11-15 16:55             ` Chris Wilson
@ 2017-11-15 17:34               ` Christian König
       [not found]                 ` <1a565164-85cf-0678-b6ab-802ec047bb91-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Christian König @ 2017-11-15 17:34 UTC (permalink / raw)
  To: Chris Wilson, Christian König, dri-devel, amd-gfx

Am 15.11.2017 um 17:55 schrieb Chris Wilson:
> Quoting Chris Wilson (2017-11-14 14:34:05)
>> Quoting Christian König (2017-11-14 14:24:44)
>>> Am 06.11.2017 um 17:22 schrieb Chris Wilson:
>>>> Quoting Christian König (2017-10-30 14:59:04)
>>>>> @@ -126,17 +127,28 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>>>>>                           dma_fence_put(old_fence);
>>>>>                           return;
>>>>>                   }
>>>>> +
>>>>> +               if (!signaled && dma_fence_is_signaled(old_fence)) {
>>>>> +                       signaled = old_fence;
>>>>> +                       signaled_idx = i;
>>>>> +               }
>>>> How much do we care about only keeping one fence per-ctx here? You could
>>>> rearrange this to break on old_fence->context == fence->context ||
>>>> dma_fence_is_signaled(old_fence) then everyone can use the final block.
>>> Yeah, that is what David Zhou suggested as well.
>>>
>>> I've rejected this approach for now cause I think we still have cases
>>> where we rely on one fence per ctx (but I'm not 100% sure).
>>>
>>> I changed patch #1 in this series as you suggest and going to send that
>>> out once more in a minute.
>>>
>>> Can we get this upstream as is for now? I won't have much more time
>>> working on this.
>> Sure, we are only discussing how we might make it look tidier, pure
>> micro-optimisation with the caveat of losing the one-fence-per-ctx
>> guarantee.
> Ah, one thing to note is that extra checking pushed one of our corner
> case tests over its time limit.
>
> If we can completely forgo the one-fence-per-ctx here, what works really
> well in testing is
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 5319ac478918..5755e95fab1b 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -104,39 +104,19 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>                                        struct reservation_object_list *fobj,
>                                        struct dma_fence *fence)
>   {
> -       struct dma_fence *replace = NULL;
> -       u32 ctx = fence->context;
> -       u32 i;
> -
>          dma_fence_get(fence);
>   
>          preempt_disable();
>          write_seqcount_begin(&obj->seq);
>   
> -       for (i = 0; i < fobj->shared_count; ++i) {
> -               struct dma_fence *check;
> -
> -               check = rcu_dereference_protected(fobj->shared[i],
> -                                                 reservation_object_held(obj));
> -
> -               if (check->context == ctx || dma_fence_is_signaled(check)) {
> -                       replace = old_fence;
> -                       break;
> -               }
> -       }
> -
>          /*
>           * memory barrier is added by write_seqcount_begin,
>           * fobj->shared_count is protected by this lock too
>           */
> -       RCU_INIT_POINTER(fobj->shared[i], fence);
> -       if (!replace)
> -               fobj->shared_count++;
> +       RCU_INIT_POINTER(fobj->shared[fobj->shared_count++], fence);
>   
>          write_seqcount_end(&obj->seq);
>          preempt_enable();
> -
> -       dma_fence_put(replace);
>   }
>   
>   static void
>
>   i.e. don't check when not replacing the shared[], on creating the new
>   buffer we then discard all the old fences.
>
> It should work for amdgpu as well since you do a ht to coalesce
> redundant fences before queuing.

That won't work for all cases. This way the reservation object would 
keep growing without a chance to ever shrink.

Christian.

> -Chris


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] dma-buf: try to replace a signaled fence in reservation_object_add_shared_inplace
       [not found]                 ` <1a565164-85cf-0678-b6ab-802ec047bb91-5C7GfCeVMHo@public.gmane.org>
@ 2017-11-15 17:43                   ` Chris Wilson
  2017-11-15 18:56                     ` Christian König
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2017-11-15 17:43 UTC (permalink / raw)
  To: Christian König, Christian König,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Quoting Christian König (2017-11-15 17:34:07)
> Am 15.11.2017 um 17:55 schrieb Chris Wilson:
> > Quoting Chris Wilson (2017-11-14 14:34:05)
> >> Quoting Christian König (2017-11-14 14:24:44)
> >>> Am 06.11.2017 um 17:22 schrieb Chris Wilson:
> >>>> Quoting Christian König (2017-10-30 14:59:04)
> >>>>> @@ -126,17 +127,28 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
> >>>>>                           dma_fence_put(old_fence);
> >>>>>                           return;
> >>>>>                   }
> >>>>> +
> >>>>> +               if (!signaled && dma_fence_is_signaled(old_fence)) {
> >>>>> +                       signaled = old_fence;
> >>>>> +                       signaled_idx = i;
> >>>>> +               }
> >>>> How much do we care about only keeping one fence per-ctx here? You could
> >>>> rearrange this to break on old_fence->context == fence->context ||
> >>>> dma_fence_is_signaled(old_fence) then everyone can use the final block.
> >>> Yeah, that is what David Zhou suggested as well.
> >>>
> >>> I've rejected this approach for now cause I think we still have cases
> >>> where we rely on one fence per ctx (but I'm not 100% sure).
> >>>
> >>> I changed patch #1 in this series as you suggest and going to send that
> >>> out once more in a minute.
> >>>
> >>> Can we get this upstream as is for now? I won't have much more time
> >>> working on this.
> >> Sure, we are only discussing how we might make it look tidier, pure
> >> micro-optimisation with the caveat of losing the one-fence-per-ctx
> >> guarantee.
> > Ah, one thing to note is that extra checking pushed one of our corner
> > case tests over its time limit.
> >
> > If we can completely forgo the one-fence-per-ctx here, what works really
> > well in testing is
> >
> > diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> > index 5319ac478918..5755e95fab1b 100644
> > --- a/drivers/dma-buf/reservation.c
> > +++ b/drivers/dma-buf/reservation.c
> > @@ -104,39 +104,19 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
> >                                        struct reservation_object_list *fobj,
> >                                        struct dma_fence *fence)
> >   {
> > -       struct dma_fence *replace = NULL;
> > -       u32 ctx = fence->context;
> > -       u32 i;
> > -
> >          dma_fence_get(fence);
> >   
> >          preempt_disable();
> >          write_seqcount_begin(&obj->seq);
> >   
> > -       for (i = 0; i < fobj->shared_count; ++i) {
> > -               struct dma_fence *check;
> > -
> > -               check = rcu_dereference_protected(fobj->shared[i],
> > -                                                 reservation_object_held(obj));
> > -
> > -               if (check->context == ctx || dma_fence_is_signaled(check)) {
> > -                       replace = old_fence;
> > -                       break;
> > -               }
> > -       }
> > -
> >          /*
> >           * memory barrier is added by write_seqcount_begin,
> >           * fobj->shared_count is protected by this lock too
> >           */
> > -       RCU_INIT_POINTER(fobj->shared[i], fence);
> > -       if (!replace)
> > -               fobj->shared_count++;
> > +       RCU_INIT_POINTER(fobj->shared[fobj->shared_count++], fence);
> >   
> >          write_seqcount_end(&obj->seq);
> >          preempt_enable();
> > -
> > -       dma_fence_put(replace);
> >   }
> >   
> >   static void
> >
> >   i.e. don't check when not replacing the shared[], on creating the new
> >   buffer we then discard all the old fences.
> >
> > It should work for amdgpu as well since you do a ht to coalesce
> > redundant fences before queuing.
> 
> That won't work for all cases. This way the reservation object would 
> keep growing without a chance to ever shrink.

We only keep the active fences when it grows, which is effective enough
to keep it in check on the workloads I can find in the hour since
noticing the failure in CI ;) And on the workloads where it is being
flooded with live fences from many contexts, the order of magnitude
throughput improvement is not easy to ignore.
-Chris
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] dma-buf: try to replace a signaled fence in reservation_object_add_shared_inplace
  2017-11-15 17:43                   ` Chris Wilson
@ 2017-11-15 18:56                     ` Christian König
  2017-11-15 19:30                       ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Christian König @ 2017-11-15 18:56 UTC (permalink / raw)
  To: Chris Wilson, Christian König, dri-devel, amd-gfx

Am 15.11.2017 um 18:43 schrieb Chris Wilson:
> Quoting Christian König (2017-11-15 17:34:07)
>> Am 15.11.2017 um 17:55 schrieb Chris Wilson:
>>> Quoting Chris Wilson (2017-11-14 14:34:05)
>>>> Quoting Christian König (2017-11-14 14:24:44)
>>>>> Am 06.11.2017 um 17:22 schrieb Chris Wilson:
>>>>>> Quoting Christian König (2017-10-30 14:59:04)
>>>>>>> @@ -126,17 +127,28 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>>>>>>>                            dma_fence_put(old_fence);
>>>>>>>                            return;
>>>>>>>                    }
>>>>>>> +
>>>>>>> +               if (!signaled && dma_fence_is_signaled(old_fence)) {
>>>>>>> +                       signaled = old_fence;
>>>>>>> +                       signaled_idx = i;
>>>>>>> +               }
>>>>>> How much do we care about only keeping one fence per-ctx here? You could
>>>>>> rearrange this to break on old_fence->context == fence->context ||
>>>>>> dma_fence_is_signaled(old_fence) then everyone can use the final block.
>>>>> Yeah, that is what David Zhou suggested as well.
>>>>>
>>>>> I've rejected this approach for now cause I think we still have cases
>>>>> where we rely on one fence per ctx (but I'm not 100% sure).
>>>>>
>>>>> I changed patch #1 in this series as you suggest and going to send that
>>>>> out once more in a minute.
>>>>>
>>>>> Can we get this upstream as is for now? I won't have much more time
>>>>> working on this.
>>>> Sure, we are only discussing how we might make it look tidier, pure
>>>> micro-optimisation with the caveat of losing the one-fence-per-ctx
>>>> guarantee.
>>> Ah, one thing to note is that extra checking pushed one of our corner
>>> case tests over its time limit.
>>>
>>> If we can completely forgo the one-fence-per-ctx here, what works really
>>> well in testing is
>>>
>>> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
>>> index 5319ac478918..5755e95fab1b 100644
>>> --- a/drivers/dma-buf/reservation.c
>>> +++ b/drivers/dma-buf/reservation.c
>>> @@ -104,39 +104,19 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>>>                                         struct reservation_object_list *fobj,
>>>                                         struct dma_fence *fence)
>>>    {
>>> -       struct dma_fence *replace = NULL;
>>> -       u32 ctx = fence->context;
>>> -       u32 i;
>>> -
>>>           dma_fence_get(fence);
>>>    
>>>           preempt_disable();
>>>           write_seqcount_begin(&obj->seq);
>>>    
>>> -       for (i = 0; i < fobj->shared_count; ++i) {
>>> -               struct dma_fence *check;
>>> -
>>> -               check = rcu_dereference_protected(fobj->shared[i],
>>> -                                                 reservation_object_held(obj));
>>> -
>>> -               if (check->context == ctx || dma_fence_is_signaled(check)) {
>>> -                       replace = old_fence;
>>> -                       break;
>>> -               }
>>> -       }
>>> -
>>>           /*
>>>            * memory barrier is added by write_seqcount_begin,
>>>            * fobj->shared_count is protected by this lock too
>>>            */
>>> -       RCU_INIT_POINTER(fobj->shared[i], fence);
>>> -       if (!replace)
>>> -               fobj->shared_count++;
>>> +       RCU_INIT_POINTER(fobj->shared[fobj->shared_count++], fence);
>>>    
>>>           write_seqcount_end(&obj->seq);
>>>           preempt_enable();
>>> -
>>> -       dma_fence_put(replace);
>>>    }
>>>    
>>>    static void
>>>
>>>    i.e. don't check when not replacing the shared[], on creating the new
>>>    buffer we then discard all the old fences.
>>>
>>> It should work for amdgpu as well since you do a ht to coalesce
>>> redundant fences before queuing.
>> That won't work for all cases. This way the reservation object would
>> keep growing without a chance to ever shrink.
> We only keep the active fences when it grows, which is effective enough
> to keep it in check on the workloads I can find in the hour since
> noticing the failure in CI ;)

Not sure what tests you run, but this change means that we basically 
just add the fence to an ever growing list of fences on every command 
submission which is only reaped when we double the list size.

That is a serious no-go as far as I can see. What we could do is improve 
reservation_object_reserve_shared() as well to figure out how many 
signaled fences are actually in the array when we reallocate it.

> And on the workloads where it is being
> flooded with live fences from many contexts, the order of magnitude
> throughput improvement is not easy to ignore.

Well not sure about the Intel driver, but since we started to mostly use 
a single reservation object per process the time spend in those 
functions on amdgpu are completely negligible.

Regards,
Christian.

> -Chris


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] dma-buf: try to replace a signaled fence in reservation_object_add_shared_inplace
  2017-11-15 18:56                     ` Christian König
@ 2017-11-15 19:30                       ` Chris Wilson
  0 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2017-11-15 19:30 UTC (permalink / raw)
  To: christian.koenig, Christian König

Quoting Christian König (2017-11-15 18:56:43)
> Am 15.11.2017 um 18:43 schrieb Chris Wilson:
> > Quoting Christian König (2017-11-15 17:34:07)
> >> Am 15.11.2017 um 17:55 schrieb Chris Wilson:
> >>> Quoting Chris Wilson (2017-11-14 14:34:05)
> >>>> Quoting Christian König (2017-11-14 14:24:44)
> >>>>> Am 06.11.2017 um 17:22 schrieb Chris Wilson:
> >>>>>> Quoting Christian König (2017-10-30 14:59:04)
> >>>>>>> @@ -126,17 +127,28 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
> >>>>>>>                            dma_fence_put(old_fence);
> >>>>>>>                            return;
> >>>>>>>                    }
> >>>>>>> +
> >>>>>>> +               if (!signaled && dma_fence_is_signaled(old_fence)) {
> >>>>>>> +                       signaled = old_fence;
> >>>>>>> +                       signaled_idx = i;
> >>>>>>> +               }
> >>>>>> How much do we care about only keeping one fence per-ctx here? You could
> >>>>>> rearrange this to break on old_fence->context == fence->context ||
> >>>>>> dma_fence_is_signaled(old_fence) then everyone can use the final block.
> >>>>> Yeah, that is what David Zhou suggested as well.
> >>>>>
> >>>>> I've rejected this approach for now cause I think we still have cases
> >>>>> where we rely on one fence per ctx (but I'm not 100% sure).
> >>>>>
> >>>>> I changed patch #1 in this series as you suggest and going to send that
> >>>>> out once more in a minute.
> >>>>>
> >>>>> Can we get this upstream as is for now? I won't have much more time
> >>>>> working on this.
> >>>> Sure, we are only discussing how we might make it look tidier, pure
> >>>> micro-optimisation with the caveat of losing the one-fence-per-ctx
> >>>> guarantee.
> >>> Ah, one thing to note is that extra checking pushed one of our corner
> >>> case tests over its time limit.
> >>>
> >>> If we can completely forgo the one-fence-per-ctx here, what works really
> >>> well in testing is
> >>>
> >>> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> >>> index 5319ac478918..5755e95fab1b 100644
> >>> --- a/drivers/dma-buf/reservation.c
> >>> +++ b/drivers/dma-buf/reservation.c
> >>> @@ -104,39 +104,19 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
> >>>                                         struct reservation_object_list *fobj,
> >>>                                         struct dma_fence *fence)
> >>>    {
> >>> -       struct dma_fence *replace = NULL;
> >>> -       u32 ctx = fence->context;
> >>> -       u32 i;
> >>> -
> >>>           dma_fence_get(fence);
> >>>    
> >>>           preempt_disable();
> >>>           write_seqcount_begin(&obj->seq);
> >>>    
> >>> -       for (i = 0; i < fobj->shared_count; ++i) {
> >>> -               struct dma_fence *check;
> >>> -
> >>> -               check = rcu_dereference_protected(fobj->shared[i],
> >>> -                                                 reservation_object_held(obj));
> >>> -
> >>> -               if (check->context == ctx || dma_fence_is_signaled(check)) {
> >>> -                       replace = old_fence;
> >>> -                       break;
> >>> -               }
> >>> -       }
> >>> -
> >>>           /*
> >>>            * memory barrier is added by write_seqcount_begin,
> >>>            * fobj->shared_count is protected by this lock too
> >>>            */
> >>> -       RCU_INIT_POINTER(fobj->shared[i], fence);
> >>> -       if (!replace)
> >>> -               fobj->shared_count++;
> >>> +       RCU_INIT_POINTER(fobj->shared[fobj->shared_count++], fence);
> >>>    
> >>>           write_seqcount_end(&obj->seq);
> >>>           preempt_enable();
> >>> -
> >>> -       dma_fence_put(replace);
> >>>    }
> >>>    
> >>>    static void
> >>>
> >>>    i.e. don't check when not replacing the shared[], on creating the new
> >>>    buffer we then discard all the old fences.
> >>>
> >>> It should work for amdgpu as well since you do a ht to coalesce
> >>> redundant fences before queuing.
> >> That won't work for all cases. This way the reservation object would
> >> keep growing without a chance to ever shrink.
> > We only keep the active fences when it grows, which is effective enough
> > to keep it in check on the workloads I can find in the hour since
> > noticing the failure in CI ;)
> 
> Not sure what tests you run, but this change means that we basically 
> just add the fence to an ever growing list of fences on every command 
> submission which is only reaped when we double the list size.

Sure, just the frequency of doubling is also halved everytime. Then the
entire array of shared is reaped when idle. Just throwing the issue out
there; something that is already slow, just got noticeably slower. And
that if we just ignore it, and only reap on reallocate, it vanishes.
(It's just one case that happened to be caught by CI because it
triggered a timeout, what CI isn't telling us is the dramatic improvement
in other cases from avoiding an array of 1,000,000 stale fences on every
object.)

> That is a serious no-go as far as I can see. What we could do is improve 
> reservation_object_reserve_shared() as well to figure out how many 
> signaled fences are actually in the array when we reallocate it.

Just amoritizing all the cost to the reallocate and then avoiding the
growth should be enough. In the past I sent a conversion of the shared
fences over to a compressed idr to avoid the cost of searching the
shared array. It doesn't have the builtin reaping of stale fences; but
we can do the discard on idle.
 
> > And on the workloads where it is being
> > flooded with live fences from many contexts, the order of magnitude
> > throughput improvement is not easy to ignore.
> 
> Well not sure about the Intel driver, but since we started to mostly use 
> a single reservation object per process the time spend in those 
> functions on amdgpu are completely negligible.

Implicit fence tracking at object granularity. They are everywhere and
very costly, until we can change the entirety of userspace (and dmabuf)
to switch over to only using explicit fencing. Then it becomes somebody
else's problem and the kernel need only track HW access which is a much
smaller (and predefined) problem.
-Chris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-11-15 19:30 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-30 14:59 [PATCH 1/2] dma-buf: keep only not signaled fence in reservation_object_add_shared_replace v2 Christian König
2017-10-30 14:59 ` [PATCH 2/2] dma-buf: try to replace a signaled fence in reservation_object_add_shared_inplace Christian König
2017-11-06 16:22   ` Chris Wilson
     [not found]     ` <150998532204.10527.4781311190552781355-M6iVdVfohj6unts5RBS2dVaTQe2KTcn/@public.gmane.org>
2017-11-14 14:24       ` Christian König
     [not found]         ` <4fc05061-e92d-1702-a0f6-ee337b7d3eb3-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-11-14 14:34           ` Chris Wilson
2017-11-15 16:55             ` Chris Wilson
2017-11-15 17:34               ` Christian König
     [not found]                 ` <1a565164-85cf-0678-b6ab-802ec047bb91-5C7GfCeVMHo@public.gmane.org>
2017-11-15 17:43                   ` Chris Wilson
2017-11-15 18:56                     ` Christian König
2017-11-15 19:30                       ` Chris Wilson
     [not found] ` <20171030145904.18584-1-deathsimple-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-11-02 12:26   ` [PATCH 1/2] dma-buf: keep only not signaled fence in reservation_object_add_shared_replace v2 Sumit Semwal
2017-11-06 16:18   ` Chris Wilson
  -- strict thread matches above, loose matches on Subject: below --
2017-10-31  8:41 Christian König
2017-10-31  8:43 Christian König
     [not found] ` <20171031084306.1986-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2017-10-31  8:51   ` Chunming Zhou
2017-10-31  8:55     ` Christian König
     [not found]       ` <bb7fb308-8e1d-96c1-6474-c1bd35fd4601-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-10-31  9:10         ` Chunming Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).