[PATCH v6] drm/sched: Make sure we wait for all dependencies in kill_jobs

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v6] drm/sched: Make sure we wait for all dependencies in kill_jobs_cb()
@ 2023-06-19  7:19 Boris Brezillon
  2023-06-21 13:56 ` Luben Tuikov
  0 siblings, 1 reply; 8+ messages in thread
From: Boris Brezillon @ 2023-06-19  7:19 UTC (permalink / raw)
  To: dri-devel
  Cc: Luben Tuikov, Sarah Walker, Sumit Semwal, Boris Brezillon,
	Donald Robson, Christian König

drm_sched_entity_kill_jobs_cb() logic is omitting the last fence popped
from the dependency array that was waited upon before
drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
so we're basically waiting for all dependencies except one.

In theory, this wait shouldn't be needed because resources should have
their users registered to the dma_resv object, thus guaranteeing that
future jobs wanting to access these resources wait on all the previous
users (depending on the access type, of course). But we want to keep
these explicit waits in the kill entity path just in case.

Let's make sure we keep all dependencies in the array in
drm_sched_job_dependency(), so we can iterate over the array and wait
in drm_sched_entity_kill_jobs_cb().

We also make sure we wait on drm_sched_fence::finished if we were
originally asked to wait on drm_sched_fence::scheduled. In that case,
we assume the intent was to delegate the wait to the firmware/GPU or
rely on the pipelining done at the entity/scheduler level, but when
killing jobs, we really want to wait for completion not just scheduling.

v6:
- Back to v4 implementation
- Add Christian's R-b

v5:
- Flag deps on which we should only wait for the scheduled event
  at insertion time

v4:
- Fix commit message
- Fix a use-after-free bug

v3:
- Always wait for drm_sched_fence::finished fences in
  drm_sched_entity_kill_jobs_cb() when we see a sched_fence

v2:
- Don't evict deps in drm_sched_job_dependency()

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Suggested-by: "Christian König" <christian.koenig@amd.com>
Reviewed-by: "Christian König" <christian.koenig@amd.com>
Cc: Frank Binns <frank.binns@imgtec.com>
Cc: Sarah Walker <sarah.walker@imgtec.com>
Cc: Donald Robson <donald.robson@imgtec.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/sched_entity.c | 41 +++++++++++++++++++-----
 1 file changed, 33 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 68e807ae136a..ec41d82d0141 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -176,16 +176,32 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
 {
 	struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
 						 finish_cb);
-	int r;
+	unsigned long index;
 
 	dma_fence_put(f);
 
 	/* Wait for all dependencies to avoid data corruptions */
-	while (!xa_empty(&job->dependencies)) {
-		f = xa_erase(&job->dependencies, job->last_dependency++);
-		r = dma_fence_add_callback(f, &job->finish_cb,
-					   drm_sched_entity_kill_jobs_cb);
-		if (!r)
+	xa_for_each(&job->dependencies, index, f) {
+		struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
+
+		if (s_fence && f == &s_fence->scheduled) {
+			/* The dependencies array had a reference on the scheduled
+			 * fence, and the finished fence refcount might have
+			 * dropped to zero. Use dma_fence_get_rcu() so we get
+			 * a NULL fence in that case.
+			 */
+			f = dma_fence_get_rcu(&s_fence->finished);
+
+			/* Now that we have a reference on the finished fence,
+			 * we can release the reference the dependencies array
+			 * had on the scheduled fence.
+			 */
+			dma_fence_put(&s_fence->scheduled);
+		}
+
+		xa_erase(&job->dependencies, index);
+		if (f && !dma_fence_add_callback(f, &job->finish_cb,
+						 drm_sched_entity_kill_jobs_cb))
 			return;
 
 		dma_fence_put(f);
@@ -415,8 +431,17 @@ static struct dma_fence *
 drm_sched_job_dependency(struct drm_sched_job *job,
 			 struct drm_sched_entity *entity)
 {
-	if (!xa_empty(&job->dependencies))
-		return xa_erase(&job->dependencies, job->last_dependency++);
+	struct dma_fence *f;
+
+	/* We keep the fence around, so we can iterate over all dependencies
+	 * in drm_sched_entity_kill_jobs_cb() to ensure all deps are signaled
+	 * before killing the job.
+	 */
+	f = xa_load(&job->dependencies, job->last_dependency);
+	if (f) {
+		job->last_dependency++;
+		return dma_fence_get(f);
+	}
 
 	if (job->sched->ops->prepare_job)
 		return job->sched->ops->prepare_job(job, entity);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v6] drm/sched: Make sure we wait for all dependencies in kill_jobs_cb()
  2023-06-19  7:19 [PATCH v6] drm/sched: Make sure we wait for all dependencies in kill_jobs_cb() Boris Brezillon
@ 2023-06-21 13:56 ` Luben Tuikov
  2023-06-21 14:18   ` Boris Brezillon
  0 siblings, 1 reply; 8+ messages in thread
From: Luben Tuikov @ 2023-06-21 13:56 UTC (permalink / raw)
  To: Boris Brezillon, dri-devel
  Cc: Sarah Walker, Sumit Semwal, Donald Robson, Christian König

On 2023-06-19 03:19, Boris Brezillon wrote:
> drm_sched_entity_kill_jobs_cb() logic is omitting the last fence popped
> from the dependency array that was waited upon before
> drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
> so we're basically waiting for all dependencies except one.
> 
> In theory, this wait shouldn't be needed because resources should have
> their users registered to the dma_resv object, thus guaranteeing that
> future jobs wanting to access these resources wait on all the previous
> users (depending on the access type, of course). But we want to keep
> these explicit waits in the kill entity path just in case.
> 
> Let's make sure we keep all dependencies in the array in
> drm_sched_job_dependency(), so we can iterate over the array and wait
> in drm_sched_entity_kill_jobs_cb().
> 
> We also make sure we wait on drm_sched_fence::finished if we were
> originally asked to wait on drm_sched_fence::scheduled. In that case,
> we assume the intent was to delegate the wait to the firmware/GPU or
> rely on the pipelining done at the entity/scheduler level, but when
> killing jobs, we really want to wait for completion not just scheduling.
> 
> v6:
> - Back to v4 implementation
> - Add Christian's R-b
> 
> v5:
> - Flag deps on which we should only wait for the scheduled event
>   at insertion time
> 
> v4:
> - Fix commit message
> - Fix a use-after-free bug
> 
> v3:
> - Always wait for drm_sched_fence::finished fences in
>   drm_sched_entity_kill_jobs_cb() when we see a sched_fence
> 
> v2:
> - Don't evict deps in drm_sched_job_dependency()

Hmm, why is this in reverse chronological order?
It's very confusing.

> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Suggested-by: "Christian König" <christian.koenig@amd.com>
> Reviewed-by: "Christian König" <christian.koenig@amd.com>

These three lines would usually come after the CCs.

Regards,
Luben

> Cc: Frank Binns <frank.binns@imgtec.com>
> Cc: Sarah Walker <sarah.walker@imgtec.com>
> Cc: Donald Robson <donald.robson@imgtec.com>
> Cc: Luben Tuikov <luben.tuikov@amd.com>
> Cc: David Airlie <airlied@gmail.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: "Christian König" <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/scheduler/sched_entity.c | 41 +++++++++++++++++++-----
>  1 file changed, 33 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 68e807ae136a..ec41d82d0141 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -176,16 +176,32 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
>  {
>  	struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
>  						 finish_cb);
> -	int r;
> +	unsigned long index;
>  
>  	dma_fence_put(f);
>  
>  	/* Wait for all dependencies to avoid data corruptions */
> -	while (!xa_empty(&job->dependencies)) {
> -		f = xa_erase(&job->dependencies, job->last_dependency++);
> -		r = dma_fence_add_callback(f, &job->finish_cb,
> -					   drm_sched_entity_kill_jobs_cb);
> -		if (!r)
> +	xa_for_each(&job->dependencies, index, f) {
> +		struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
> +
> +		if (s_fence && f == &s_fence->scheduled) {
> +			/* The dependencies array had a reference on the scheduled
> +			 * fence, and the finished fence refcount might have
> +			 * dropped to zero. Use dma_fence_get_rcu() so we get
> +			 * a NULL fence in that case.
> +			 */
> +			f = dma_fence_get_rcu(&s_fence->finished);
> +
> +			/* Now that we have a reference on the finished fence,
> +			 * we can release the reference the dependencies array
> +			 * had on the scheduled fence.
> +			 */
> +			dma_fence_put(&s_fence->scheduled);
> +		}
> +
> +		xa_erase(&job->dependencies, index);
> +		if (f && !dma_fence_add_callback(f, &job->finish_cb,
> +						 drm_sched_entity_kill_jobs_cb))
>  			return;
>  
>  		dma_fence_put(f);
> @@ -415,8 +431,17 @@ static struct dma_fence *
>  drm_sched_job_dependency(struct drm_sched_job *job,
>  			 struct drm_sched_entity *entity)
>  {
> -	if (!xa_empty(&job->dependencies))
> -		return xa_erase(&job->dependencies, job->last_dependency++);
> +	struct dma_fence *f;
> +
> +	/* We keep the fence around, so we can iterate over all dependencies
> +	 * in drm_sched_entity_kill_jobs_cb() to ensure all deps are signaled
> +	 * before killing the job.
> +	 */
> +	f = xa_load(&job->dependencies, job->last_dependency);
> +	if (f) {
> +		job->last_dependency++;
> +		return dma_fence_get(f);
> +	}
>  
>  	if (job->sched->ops->prepare_job)
>  		return job->sched->ops->prepare_job(job, entity);


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v6] drm/sched: Make sure we wait for all dependencies in kill_jobs_cb()
  2023-06-21 13:56 ` Luben Tuikov
@ 2023-06-21 14:18   ` Boris Brezillon
  2023-06-21 14:41     ` Luben Tuikov
  0 siblings, 1 reply; 8+ messages in thread
From: Boris Brezillon @ 2023-06-21 14:18 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: Sarah Walker, dri-devel, Sumit Semwal, Donald Robson,
	Christian König

Hello Luben,

On Wed, 21 Jun 2023 09:56:40 -0400
Luben Tuikov <luben.tuikov@amd.com> wrote:

> On 2023-06-19 03:19, Boris Brezillon wrote:
> > drm_sched_entity_kill_jobs_cb() logic is omitting the last fence popped
> > from the dependency array that was waited upon before
> > drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
> > so we're basically waiting for all dependencies except one.
> > 
> > In theory, this wait shouldn't be needed because resources should have
> > their users registered to the dma_resv object, thus guaranteeing that
> > future jobs wanting to access these resources wait on all the previous
> > users (depending on the access type, of course). But we want to keep
> > these explicit waits in the kill entity path just in case.
> > 
> > Let's make sure we keep all dependencies in the array in
> > drm_sched_job_dependency(), so we can iterate over the array and wait
> > in drm_sched_entity_kill_jobs_cb().
> > 
> > We also make sure we wait on drm_sched_fence::finished if we were
> > originally asked to wait on drm_sched_fence::scheduled. In that case,
> > we assume the intent was to delegate the wait to the firmware/GPU or
> > rely on the pipelining done at the entity/scheduler level, but when
> > killing jobs, we really want to wait for completion not just scheduling.
> > 
> > v6:
> > - Back to v4 implementation
> > - Add Christian's R-b
> > 
> > v5:
> > - Flag deps on which we should only wait for the scheduled event
> >   at insertion time
> > 
> > v4:
> > - Fix commit message
> > - Fix a use-after-free bug
> > 
> > v3:
> > - Always wait for drm_sched_fence::finished fences in
> >   drm_sched_entity_kill_jobs_cb() when we see a sched_fence
> > 
> > v2:
> > - Don't evict deps in drm_sched_job_dependency()  
> 
> Hmm, why is this in reverse chronological order?
> It's very confusing.

Dunno, that's how I've always ordered things, and quick look at some
dri-devel patches [1][2] makes me think I'm not the only one to start
from the latest submission.

[1]https://lkml.org/lkml/2023/6/19/941
[2]https://lore.kernel.org/dri-devel/cover.1686729444.git.Sandor.yu@nxp.com/T/#t

> 
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > Suggested-by: "Christian König" <christian.koenig@amd.com>
> > Reviewed-by: "Christian König" <christian.koenig@amd.com>  
> 
> These three lines would usually come after the CCs.

Again, I think I've always inserted those tags before the Cc, but I can
re-order things if you prefer. Let me know if you want me to send a v7
addressing the Cc+changelog ordering.

Regards,

Boris

> 
> Regards,
> Luben
> 
> > Cc: Frank Binns <frank.binns@imgtec.com>
> > Cc: Sarah Walker <sarah.walker@imgtec.com>
> > Cc: Donald Robson <donald.robson@imgtec.com>
> > Cc: Luben Tuikov <luben.tuikov@amd.com>
> > Cc: David Airlie <airlied@gmail.com>
> > Cc: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > ---
> >  drivers/gpu/drm/scheduler/sched_entity.c | 41 +++++++++++++++++++-----
> >  1 file changed, 33 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > index 68e807ae136a..ec41d82d0141 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -176,16 +176,32 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
> >  {
> >  	struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
> >  						 finish_cb);
> > -	int r;
> > +	unsigned long index;
> >  
> >  	dma_fence_put(f);
> >  
> >  	/* Wait for all dependencies to avoid data corruptions */
> > -	while (!xa_empty(&job->dependencies)) {
> > -		f = xa_erase(&job->dependencies, job->last_dependency++);
> > -		r = dma_fence_add_callback(f, &job->finish_cb,
> > -					   drm_sched_entity_kill_jobs_cb);
> > -		if (!r)
> > +	xa_for_each(&job->dependencies, index, f) {
> > +		struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
> > +
> > +		if (s_fence && f == &s_fence->scheduled) {
> > +			/* The dependencies array had a reference on the scheduled
> > +			 * fence, and the finished fence refcount might have
> > +			 * dropped to zero. Use dma_fence_get_rcu() so we get
> > +			 * a NULL fence in that case.
> > +			 */
> > +			f = dma_fence_get_rcu(&s_fence->finished);
> > +
> > +			/* Now that we have a reference on the finished fence,
> > +			 * we can release the reference the dependencies array
> > +			 * had on the scheduled fence.
> > +			 */
> > +			dma_fence_put(&s_fence->scheduled);
> > +		}
> > +
> > +		xa_erase(&job->dependencies, index);
> > +		if (f && !dma_fence_add_callback(f, &job->finish_cb,
> > +						 drm_sched_entity_kill_jobs_cb))
> >  			return;
> >  
> >  		dma_fence_put(f);
> > @@ -415,8 +431,17 @@ static struct dma_fence *
> >  drm_sched_job_dependency(struct drm_sched_job *job,
> >  			 struct drm_sched_entity *entity)
> >  {
> > -	if (!xa_empty(&job->dependencies))
> > -		return xa_erase(&job->dependencies, job->last_dependency++);
> > +	struct dma_fence *f;
> > +
> > +	/* We keep the fence around, so we can iterate over all dependencies
> > +	 * in drm_sched_entity_kill_jobs_cb() to ensure all deps are signaled
> > +	 * before killing the job.
> > +	 */
> > +	f = xa_load(&job->dependencies, job->last_dependency);
> > +	if (f) {
> > +		job->last_dependency++;
> > +		return dma_fence_get(f);
> > +	}
> >  
> >  	if (job->sched->ops->prepare_job)
> >  		return job->sched->ops->prepare_job(job, entity);  
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v6] drm/sched: Make sure we wait for all dependencies in kill_jobs_cb()
  2023-06-21 14:18   ` Boris Brezillon
@ 2023-06-21 14:41     ` Luben Tuikov
  2023-06-21 14:53       ` Boris Brezillon
  0 siblings, 1 reply; 8+ messages in thread
From: Luben Tuikov @ 2023-06-21 14:41 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Sarah Walker, dri-devel, Sumit Semwal, Donald Robson,
	Christian König

On 2023-06-21 10:18, Boris Brezillon wrote:
> Hello Luben,
> 
> On Wed, 21 Jun 2023 09:56:40 -0400
> Luben Tuikov <luben.tuikov@amd.com> wrote:
> 
>> On 2023-06-19 03:19, Boris Brezillon wrote:
>>> drm_sched_entity_kill_jobs_cb() logic is omitting the last fence popped
>>> from the dependency array that was waited upon before
>>> drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
>>> so we're basically waiting for all dependencies except one.
>>>
>>> In theory, this wait shouldn't be needed because resources should have
>>> their users registered to the dma_resv object, thus guaranteeing that
>>> future jobs wanting to access these resources wait on all the previous
>>> users (depending on the access type, of course). But we want to keep
>>> these explicit waits in the kill entity path just in case.
>>>
>>> Let's make sure we keep all dependencies in the array in
>>> drm_sched_job_dependency(), so we can iterate over the array and wait
>>> in drm_sched_entity_kill_jobs_cb().
>>>
>>> We also make sure we wait on drm_sched_fence::finished if we were
>>> originally asked to wait on drm_sched_fence::scheduled. In that case,
>>> we assume the intent was to delegate the wait to the firmware/GPU or
>>> rely on the pipelining done at the entity/scheduler level, but when
>>> killing jobs, we really want to wait for completion not just scheduling.
>>>
>>> v6:
>>> - Back to v4 implementation
>>> - Add Christian's R-b
>>>
>>> v5:
>>> - Flag deps on which we should only wait for the scheduled event
>>>   at insertion time
>>>
>>> v4:
>>> - Fix commit message
>>> - Fix a use-after-free bug
>>>
>>> v3:
>>> - Always wait for drm_sched_fence::finished fences in
>>>   drm_sched_entity_kill_jobs_cb() when we see a sched_fence
>>>
>>> v2:
>>> - Don't evict deps in drm_sched_job_dependency()  
>>
>> Hmm, why is this in reverse chronological order?
>> It's very confusing.
> 
> Dunno, that's how I've always ordered things, and quick look at some
> dri-devel patches [1][2] makes me think I'm not the only one to start
> from the latest submission.
> 
> [1]https://lkml.org/lkml/2023/6/19/941
> [2]https://lore.kernel.org/dri-devel/cover.1686729444.git.Sandor.yu@nxp.com/T/#t
> 
>>
>>>
>>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
>>> Suggested-by: "Christian König" <christian.koenig@amd.com>
>>> Reviewed-by: "Christian König" <christian.koenig@amd.com>  
>>
>> These three lines would usually come after the CCs.
> 
> Again, I think I've always inserted those tags before the Cc, but I can
> re-order things if you prefer. Let me know if you want me to send a v7
> addressing the Cc+changelog ordering.

No, it's not necessary for this patch, but in the future I'd rather follow
chronological ordering for the versions, and in the Cc list. It's similar
to how the patch description follows (narrative text) and to how we reply
back to emails, and prevalently in the kernel log in drm ("git log" should
suffice).

Reading in chronological progression builds a narrative, a picture, in one's
mind and makes it easy to see justifications for said narrative, or see reasons
to change the narrative.

That is, one can make a better decision knowing the full history, rather than
only the latest change.

(And in fact when I read the version revision list, my eyes skip over v[X]
and just read down, so I was wondering why and how Christian R-B the patch
in v2, and it wasn't until I actually saw that they were ordered in reverse
chronological order, which was in fact v6--listed first, which I'd assumed
was listed last.)

Do you have access or do you know who is pushing this patch to drm-misc-fixes?
-- 
Regards,
Luben


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v6] drm/sched: Make sure we wait for all dependencies in kill_jobs_cb()
  2023-06-21 14:41     ` Luben Tuikov
@ 2023-06-21 14:53       ` Boris Brezillon
  2023-06-21 15:03         ` Luben Tuikov
  0 siblings, 1 reply; 8+ messages in thread
From: Boris Brezillon @ 2023-06-21 14:53 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: Sarah Walker, dri-devel, Sumit Semwal, Donald Robson,
	Christian König

On Wed, 21 Jun 2023 10:41:22 -0400
Luben Tuikov <luben.tuikov@amd.com> wrote:

> On 2023-06-21 10:18, Boris Brezillon wrote:
> > Hello Luben,
> > 
> > On Wed, 21 Jun 2023 09:56:40 -0400
> > Luben Tuikov <luben.tuikov@amd.com> wrote:
> >   
> >> On 2023-06-19 03:19, Boris Brezillon wrote:  
> >>> drm_sched_entity_kill_jobs_cb() logic is omitting the last fence popped
> >>> from the dependency array that was waited upon before
> >>> drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
> >>> so we're basically waiting for all dependencies except one.
> >>>
> >>> In theory, this wait shouldn't be needed because resources should have
> >>> their users registered to the dma_resv object, thus guaranteeing that
> >>> future jobs wanting to access these resources wait on all the previous
> >>> users (depending on the access type, of course). But we want to keep
> >>> these explicit waits in the kill entity path just in case.
> >>>
> >>> Let's make sure we keep all dependencies in the array in
> >>> drm_sched_job_dependency(), so we can iterate over the array and wait
> >>> in drm_sched_entity_kill_jobs_cb().
> >>>
> >>> We also make sure we wait on drm_sched_fence::finished if we were
> >>> originally asked to wait on drm_sched_fence::scheduled. In that case,
> >>> we assume the intent was to delegate the wait to the firmware/GPU or
> >>> rely on the pipelining done at the entity/scheduler level, but when
> >>> killing jobs, we really want to wait for completion not just scheduling.
> >>>
> >>> v6:
> >>> - Back to v4 implementation
> >>> - Add Christian's R-b
> >>>
> >>> v5:
> >>> - Flag deps on which we should only wait for the scheduled event
> >>>   at insertion time
> >>>
> >>> v4:
> >>> - Fix commit message
> >>> - Fix a use-after-free bug
> >>>
> >>> v3:
> >>> - Always wait for drm_sched_fence::finished fences in
> >>>   drm_sched_entity_kill_jobs_cb() when we see a sched_fence
> >>>
> >>> v2:
> >>> - Don't evict deps in drm_sched_job_dependency()    
> >>
> >> Hmm, why is this in reverse chronological order?
> >> It's very confusing.  
> > 
> > Dunno, that's how I've always ordered things, and quick look at some
> > dri-devel patches [1][2] makes me think I'm not the only one to start
> > from the latest submission.
> > 
> > [1]https://lkml.org/lkml/2023/6/19/941
> > [2]https://lore.kernel.org/dri-devel/cover.1686729444.git.Sandor.yu@nxp.com/T/#t
> >   
> >>  
> >>>
> >>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> >>> Suggested-by: "Christian König" <christian.koenig@amd.com>
> >>> Reviewed-by: "Christian König" <christian.koenig@amd.com>    
> >>
> >> These three lines would usually come after the CCs.  
> > 
> > Again, I think I've always inserted those tags before the Cc, but I can
> > re-order things if you prefer. Let me know if you want me to send a v7
> > addressing the Cc+changelog ordering.  
> 
> No, it's not necessary for this patch, but in the future I'd rather follow
> chronological ordering for the versions, and in the Cc list. It's similar
> to how the patch description follows (narrative text) and to how we reply
> back to emails, and prevalently in the kernel log in drm ("git log" should
> suffice).
> 
> Reading in chronological progression builds a narrative, a picture, in one's
> mind and makes it easy to see justifications for said narrative, or see reasons
> to change the narrative.
> 
> That is, one can make a better decision knowing the full history, rather than
> only the latest change.
> 
> (And in fact when I read the version revision list, my eyes skip over v[X]
> and just read down, so I was wondering why and how Christian R-B the patch
> in v2, and it wasn't until I actually saw that they were ordered in reverse
> chronological order, which was in fact v6--listed first, which I'd assumed
> was listed last.)
> 
> Do you have access or do you know who is pushing this patch to drm-misc-fixes?

I can push it.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v6] drm/sched: Make sure we wait for all dependencies in kill_jobs_cb()
  2023-06-21 14:53       ` Boris Brezillon
@ 2023-06-21 15:03         ` Luben Tuikov
  2023-06-22  9:56           ` Boris Brezillon
  0 siblings, 1 reply; 8+ messages in thread
From: Luben Tuikov @ 2023-06-21 15:03 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Sarah Walker, dri-devel, Sumit Semwal, Donald Robson,
	Christian König

On 2023-06-21 10:53, Boris Brezillon wrote:
> On Wed, 21 Jun 2023 10:41:22 -0400
> Luben Tuikov <luben.tuikov@amd.com> wrote:
> 
>> On 2023-06-21 10:18, Boris Brezillon wrote:
>>> Hello Luben,
>>>
>>> On Wed, 21 Jun 2023 09:56:40 -0400
>>> Luben Tuikov <luben.tuikov@amd.com> wrote:
>>>   
>>>> On 2023-06-19 03:19, Boris Brezillon wrote:  
>>>>> drm_sched_entity_kill_jobs_cb() logic is omitting the last fence popped
>>>>> from the dependency array that was waited upon before
>>>>> drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
>>>>> so we're basically waiting for all dependencies except one.
>>>>>
>>>>> In theory, this wait shouldn't be needed because resources should have
>>>>> their users registered to the dma_resv object, thus guaranteeing that
>>>>> future jobs wanting to access these resources wait on all the previous
>>>>> users (depending on the access type, of course). But we want to keep
>>>>> these explicit waits in the kill entity path just in case.
>>>>>
>>>>> Let's make sure we keep all dependencies in the array in
>>>>> drm_sched_job_dependency(), so we can iterate over the array and wait
>>>>> in drm_sched_entity_kill_jobs_cb().
>>>>>
>>>>> We also make sure we wait on drm_sched_fence::finished if we were
>>>>> originally asked to wait on drm_sched_fence::scheduled. In that case,
>>>>> we assume the intent was to delegate the wait to the firmware/GPU or
>>>>> rely on the pipelining done at the entity/scheduler level, but when
>>>>> killing jobs, we really want to wait for completion not just scheduling.
>>>>>
>>>>> v6:
>>>>> - Back to v4 implementation
>>>>> - Add Christian's R-b
>>>>>
>>>>> v5:
>>>>> - Flag deps on which we should only wait for the scheduled event
>>>>>   at insertion time
>>>>>
>>>>> v4:
>>>>> - Fix commit message
>>>>> - Fix a use-after-free bug
>>>>>
>>>>> v3:
>>>>> - Always wait for drm_sched_fence::finished fences in
>>>>>   drm_sched_entity_kill_jobs_cb() when we see a sched_fence
>>>>>
>>>>> v2:
>>>>> - Don't evict deps in drm_sched_job_dependency()    
>>>>
>>>> Hmm, why is this in reverse chronological order?
>>>> It's very confusing.  
>>>
>>> Dunno, that's how I've always ordered things, and quick look at some
>>> dri-devel patches [1][2] makes me think I'm not the only one to start
>>> from the latest submission.
>>>
>>> [1]https://lkml.org/lkml/2023/6/19/941
>>> [2]https://lore.kernel.org/dri-devel/cover.1686729444.git.Sandor.yu@nxp.com/T/#t
>>>   
>>>>  
>>>>>
>>>>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
>>>>> Suggested-by: "Christian König" <christian.koenig@amd.com>
>>>>> Reviewed-by: "Christian König" <christian.koenig@amd.com>    
>>>>
>>>> These three lines would usually come after the CCs.  
>>>
>>> Again, I think I've always inserted those tags before the Cc, but I can
>>> re-order things if you prefer. Let me know if you want me to send a v7
>>> addressing the Cc+changelog ordering.  
>>
>> No, it's not necessary for this patch, but in the future I'd rather follow
>> chronological ordering for the versions, and in the Cc list. It's similar
>> to how the patch description follows (narrative text) and to how we reply
>> back to emails, and prevalently in the kernel log in drm ("git log" should
>> suffice).
>>
>> Reading in chronological progression builds a narrative, a picture, in one's
>> mind and makes it easy to see justifications for said narrative, or see reasons
>> to change the narrative.
>>
>> That is, one can make a better decision knowing the full history, rather than
>> only the latest change.
>>
>> (And in fact when I read the version revision list, my eyes skip over v[X]
>> and just read down, so I was wondering why and how Christian R-B the patch
>> in v2, and it wasn't until I actually saw that they were ordered in reverse
>> chronological order, which was in fact v6--listed first, which I'd assumed
>> was listed last.)
>>
>> Do you have access or do you know who is pushing this patch to drm-misc-fixes?
> 
> I can push it.
> 

Acked-by: Luben Tuikov <luben.tuikov@amd.com>

Regards,
Luben


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v6] drm/sched: Make sure we wait for all dependencies in kill_jobs_cb()
  2023-06-21 15:03         ` Luben Tuikov
@ 2023-06-22  9:56           ` Boris Brezillon
  2023-06-22 17:21             ` Luben Tuikov
  0 siblings, 1 reply; 8+ messages in thread
From: Boris Brezillon @ 2023-06-22  9:56 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: Sarah Walker, dri-devel, Sumit Semwal, Donald Robson,
	Christian König

On Wed, 21 Jun 2023 11:03:48 -0400
Luben Tuikov <luben.tuikov@amd.com> wrote:

> On 2023-06-21 10:53, Boris Brezillon wrote:
> > On Wed, 21 Jun 2023 10:41:22 -0400
> > Luben Tuikov <luben.tuikov@amd.com> wrote:
> >   
> >> On 2023-06-21 10:18, Boris Brezillon wrote:  
> >>> Hello Luben,
> >>>
> >>> On Wed, 21 Jun 2023 09:56:40 -0400
> >>> Luben Tuikov <luben.tuikov@amd.com> wrote:
> >>>     
> >>>> On 2023-06-19 03:19, Boris Brezillon wrote:    
> >>>>> drm_sched_entity_kill_jobs_cb() logic is omitting the last fence popped
> >>>>> from the dependency array that was waited upon before
> >>>>> drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
> >>>>> so we're basically waiting for all dependencies except one.
> >>>>>
> >>>>> In theory, this wait shouldn't be needed because resources should have
> >>>>> their users registered to the dma_resv object, thus guaranteeing that
> >>>>> future jobs wanting to access these resources wait on all the previous
> >>>>> users (depending on the access type, of course). But we want to keep
> >>>>> these explicit waits in the kill entity path just in case.
> >>>>>
> >>>>> Let's make sure we keep all dependencies in the array in
> >>>>> drm_sched_job_dependency(), so we can iterate over the array and wait
> >>>>> in drm_sched_entity_kill_jobs_cb().
> >>>>>
> >>>>> We also make sure we wait on drm_sched_fence::finished if we were
> >>>>> originally asked to wait on drm_sched_fence::scheduled. In that case,
> >>>>> we assume the intent was to delegate the wait to the firmware/GPU or
> >>>>> rely on the pipelining done at the entity/scheduler level, but when
> >>>>> killing jobs, we really want to wait for completion not just scheduling.
> >>>>>
> >>>>> v6:
> >>>>> - Back to v4 implementation
> >>>>> - Add Christian's R-b
> >>>>>
> >>>>> v5:
> >>>>> - Flag deps on which we should only wait for the scheduled event
> >>>>>   at insertion time
> >>>>>
> >>>>> v4:
> >>>>> - Fix commit message
> >>>>> - Fix a use-after-free bug
> >>>>>
> >>>>> v3:
> >>>>> - Always wait for drm_sched_fence::finished fences in
> >>>>>   drm_sched_entity_kill_jobs_cb() when we see a sched_fence
> >>>>>
> >>>>> v2:
> >>>>> - Don't evict deps in drm_sched_job_dependency()      
> >>>>
> >>>> Hmm, why is this in reverse chronological order?
> >>>> It's very confusing.    
> >>>
> >>> Dunno, that's how I've always ordered things, and quick look at some
> >>> dri-devel patches [1][2] makes me think I'm not the only one to start
> >>> from the latest submission.
> >>>
> >>> [1]https://lkml.org/lkml/2023/6/19/941
> >>> [2]https://lore.kernel.org/dri-devel/cover.1686729444.git.Sandor.yu@nxp.com/T/#t
> >>>     
> >>>>    
> >>>>>
> >>>>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> >>>>> Suggested-by: "Christian König" <christian.koenig@amd.com>
> >>>>> Reviewed-by: "Christian König" <christian.koenig@amd.com>      
> >>>>
> >>>> These three lines would usually come after the CCs.    
> >>>
> >>> Again, I think I've always inserted those tags before the Cc, but I can
> >>> re-order things if you prefer. Let me know if you want me to send a v7
> >>> addressing the Cc+changelog ordering.    
> >>
> >> No, it's not necessary for this patch, but in the future I'd rather follow
> >> chronological ordering for the versions, and in the Cc list. It's similar
> >> to how the patch description follows (narrative text) and to how we reply
> >> back to emails, and prevalently in the kernel log in drm ("git log" should
> >> suffice).
> >>
> >> Reading in chronological progression builds a narrative, a picture, in one's
> >> mind and makes it easy to see justifications for said narrative, or see reasons
> >> to change the narrative.
> >>
> >> That is, one can make a better decision knowing the full history, rather than
> >> only the latest change.
> >>
> >> (And in fact when I read the version revision list, my eyes skip over v[X]
> >> and just read down, so I was wondering why and how Christian R-B the patch
> >> in v2, and it wasn't until I actually saw that they were ordered in reverse
> >> chronological order, which was in fact v6--listed first, which I'd assumed
> >> was listed last.)
> >>
> >> Do you have access or do you know who is pushing this patch to drm-misc-fixes?  
> > 
> > I can push it.
> >   
> 
> Acked-by: Luben Tuikov <luben.tuikov@amd.com>

Queued to drm-misc-fixes after re-ordering things in the commit message
as you suggested.

Regards,

Boris


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v6] drm/sched: Make sure we wait for all dependencies in kill_jobs_cb()
  2023-06-22  9:56           ` Boris Brezillon
@ 2023-06-22 17:21             ` Luben Tuikov
  0 siblings, 0 replies; 8+ messages in thread
From: Luben Tuikov @ 2023-06-22 17:21 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Sarah Walker, dri-devel, Sumit Semwal, Donald Robson,
	Christian König

On 2023-06-22 05:56, Boris Brezillon wrote:
> On Wed, 21 Jun 2023 11:03:48 -0400
> Luben Tuikov <luben.tuikov@amd.com> wrote:
> 
>> On 2023-06-21 10:53, Boris Brezillon wrote:
>>> On Wed, 21 Jun 2023 10:41:22 -0400
>>> Luben Tuikov <luben.tuikov@amd.com> wrote:
>>>   
>>>> On 2023-06-21 10:18, Boris Brezillon wrote:  
>>>>> Hello Luben,
>>>>>
>>>>> On Wed, 21 Jun 2023 09:56:40 -0400
>>>>> Luben Tuikov <luben.tuikov@amd.com> wrote:
>>>>>     
>>>>>> On 2023-06-19 03:19, Boris Brezillon wrote:    
>>>>>>> drm_sched_entity_kill_jobs_cb() logic is omitting the last fence popped
>>>>>>> from the dependency array that was waited upon before
>>>>>>> drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
>>>>>>> so we're basically waiting for all dependencies except one.
>>>>>>>
>>>>>>> In theory, this wait shouldn't be needed because resources should have
>>>>>>> their users registered to the dma_resv object, thus guaranteeing that
>>>>>>> future jobs wanting to access these resources wait on all the previous
>>>>>>> users (depending on the access type, of course). But we want to keep
>>>>>>> these explicit waits in the kill entity path just in case.
>>>>>>>
>>>>>>> Let's make sure we keep all dependencies in the array in
>>>>>>> drm_sched_job_dependency(), so we can iterate over the array and wait
>>>>>>> in drm_sched_entity_kill_jobs_cb().
>>>>>>>
>>>>>>> We also make sure we wait on drm_sched_fence::finished if we were
>>>>>>> originally asked to wait on drm_sched_fence::scheduled. In that case,
>>>>>>> we assume the intent was to delegate the wait to the firmware/GPU or
>>>>>>> rely on the pipelining done at the entity/scheduler level, but when
>>>>>>> killing jobs, we really want to wait for completion not just scheduling.
>>>>>>>
>>>>>>> v6:
>>>>>>> - Back to v4 implementation
>>>>>>> - Add Christian's R-b
>>>>>>>
>>>>>>> v5:
>>>>>>> - Flag deps on which we should only wait for the scheduled event
>>>>>>>   at insertion time
>>>>>>>
>>>>>>> v4:
>>>>>>> - Fix commit message
>>>>>>> - Fix a use-after-free bug
>>>>>>>
>>>>>>> v3:
>>>>>>> - Always wait for drm_sched_fence::finished fences in
>>>>>>>   drm_sched_entity_kill_jobs_cb() when we see a sched_fence
>>>>>>>
>>>>>>> v2:
>>>>>>> - Don't evict deps in drm_sched_job_dependency()      
>>>>>>
>>>>>> Hmm, why is this in reverse chronological order?
>>>>>> It's very confusing.    
>>>>>
>>>>> Dunno, that's how I've always ordered things, and quick look at some
>>>>> dri-devel patches [1][2] makes me think I'm not the only one to start
>>>>> from the latest submission.
>>>>>
>>>>> [1]https://lkml.org/lkml/2023/6/19/941
>>>>> [2]https://lore.kernel.org/dri-devel/cover.1686729444.git.Sandor.yu@nxp.com/T/#t
>>>>>     
>>>>>>    
>>>>>>>
>>>>>>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
>>>>>>> Suggested-by: "Christian König" <christian.koenig@amd.com>
>>>>>>> Reviewed-by: "Christian König" <christian.koenig@amd.com>      
>>>>>>
>>>>>> These three lines would usually come after the CCs.    
>>>>>
>>>>> Again, I think I've always inserted those tags before the Cc, but I can
>>>>> re-order things if you prefer. Let me know if you want me to send a v7
>>>>> addressing the Cc+changelog ordering.    
>>>>
>>>> No, it's not necessary for this patch, but in the future I'd rather follow
>>>> chronological ordering for the versions, and in the Cc list. It's similar
>>>> to how the patch description follows (narrative text) and to how we reply
>>>> back to emails, and prevalently in the kernel log in drm ("git log" should
>>>> suffice).
>>>>
>>>> Reading in chronological progression builds a narrative, a picture, in one's
>>>> mind and makes it easy to see justifications for said narrative, or see reasons
>>>> to change the narrative.
>>>>
>>>> That is, one can make a better decision knowing the full history, rather than
>>>> only the latest change.
>>>>
>>>> (And in fact when I read the version revision list, my eyes skip over v[X]
>>>> and just read down, so I was wondering why and how Christian R-B the patch
>>>> in v2, and it wasn't until I actually saw that they were ordered in reverse
>>>> chronological order, which was in fact v6--listed first, which I'd assumed
>>>> was listed last.)
>>>>
>>>> Do you have access or do you know who is pushing this patch to drm-misc-fixes?  
>>>
>>> I can push it.
>>>   
>>
>> Acked-by: Luben Tuikov <luben.tuikov@amd.com>
> 
> Queued to drm-misc-fixes after re-ordering things in the commit message
> as you suggested.
> 
> Regards,
> 
> Boris
> 

That's great! Thank you.
-- 
Regards,
Luben


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-06-22 17:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-19  7:19 [PATCH v6] drm/sched: Make sure we wait for all dependencies in kill_jobs_cb() Boris Brezillon
2023-06-21 13:56 ` Luben Tuikov
2023-06-21 14:18   ` Boris Brezillon
2023-06-21 14:41     ` Luben Tuikov
2023-06-21 14:53       ` Boris Brezillon
2023-06-21 15:03         ` Luben Tuikov
2023-06-22  9:56           ` Boris Brezillon
2023-06-22 17:21             ` Luben Tuikov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.