From: Boris Brezillon <boris.brezillon@collabora.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: Sarah Walker <sarah.walker@imgtec.com>,
dri-devel@lists.freedesktop.org,
Luben Tuikov <luben.tuikov@amd.com>,
Donald Robson <donald.robson@imgtec.com>,
Sumit Semwal <sumit.semwal@linaro.org>
Subject: Re: [RFC PATCH] drm/sched: Wait for the currently popped dependency in kill_jobs_cb()
Date: Fri, 9 Jun 2023 16:10:06 +0200 [thread overview]
Message-ID: <20230609161006.7992a477@collabora.com> (raw)
In-Reply-To: <bec5582e-2172-cb01-c8b2-283b9aab414a@amd.com>
Hello Christian,
On Fri, 9 Jun 2023 13:53:59 +0200
Christian König <christian.koenig@amd.com> wrote:
> Am 08.06.23 um 08:55 schrieb Boris Brezillon:
> > If I understand correctly, drm_sched_entity_kill_jobs_cb() is supposed
> > to wait on all the external dependencies (those added to
> > drm_sched_job::dependencies) before signaling the job finished fence.
> > This is done this way to prevent jobs depending on these canceled jobs
> > from considering the resources they want to access as ready, when
> > they're actually still used by other components, thus leading to
> > potential data corruptions.
> >
> > The problem is, the kill_jobs logic is omitting the last fence popped
> > from the dependencies array that was waited upon before
> > drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
> > so we're basically waiting for all dependencies except one.
> >
> > This is an attempt at fixing that, but also an opportunity to make sure
> > I understood the drm_sched_entity_kill(), because I'm not 100% sure if
> > skipping this currently popped dependency was intentional or not. I can't
> > see a good reason why we'd want to skip that one, but I might be missing
> > something.
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > Cc: Frank Binns <frank.binns@imgtec.com>
> > Cc: Sarah Walker <sarah.walker@imgtec.com>
> > Cc: Donald Robson <donald.robson@imgtec.com>
> > Cc: Luben Tuikov <luben.tuikov@amd.com>
> > Cc: David Airlie <airlied@gmail.com>
> > Cc: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > ---
> > Stumbled across this issue while working on native dependency support
> > with Donald (which will be posted soon). Flagged as RFC, because I'm
> > not sure this is legit, and also not sure we want to fix it this way.
> > I tried re-using drm_sched_entity::dependency, but it's a bit of a mess
> > because of the asynchronousity of the wait, and the fact we use
> > drm_sched_entity::dependency to know if we have a clear_dep()
> > callback registered, so we can easily reset it without removing the
> > callback.
>
> Well yes, that's a known problem. But this is really not the right
> approach to fixing this.
>
> Trying to wait for all the dependencies before killing jobs was added
> because of the way we kept track of dma_fences in dma_resv objects.
> Basically adding exclusive fences removed all other fences leading to a
> bit fragile memory management.
Okay.
>
> This handling was removed by now and so the workaround for waiting for
> dependencies is not really necessary any more, but I think it is still
> better to do so.
>
> The right approach of getting this waiting for dependencies completely
> straight is also not to touch entity->dependency in any way, but to stop
> removing them from the XA in drm_sched_job_dependency(). Otherwise you
> don't catch the pipeline optimized ones either.
Do you want me to post a v2 doing that, or should I forget about it?
If we decide to keep things like that, it might be good to at least
add a comment explaining why we don't care.
Regards,
Boris
next prev parent reply other threads:[~2023-06-09 14:10 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-08 6:55 [RFC PATCH] drm/sched: Wait for the currently popped dependency in kill_jobs_cb() Boris Brezillon
2023-06-08 7:05 ` Boris Brezillon
2023-06-09 11:53 ` Christian König
2023-06-09 14:10 ` Boris Brezillon [this message]
2023-06-09 14:29 ` Christian König
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230609161006.7992a477@collabora.com \
--to=boris.brezillon@collabora.com \
--cc=christian.koenig@amd.com \
--cc=donald.robson@imgtec.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=luben.tuikov@amd.com \
--cc=sarah.walker@imgtec.com \
--cc=sumit.semwal@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.