Re: [Regression] drm/scheduler: track GPU active time per entity

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Daniel Vetter <daniel@ffwll.ch>
To: Lucas Stach <l.stach@pengutronix.de>
Cc: "Daniel Vetter" <daniel@ffwll.ch>,
	"Christian König" <christian.koenig@amd.com>,
	"Luben Tuikov" <luben.tuikov@amd.com>,
	"Danilo Krummrich" <dakr@redhat.com>,
	"Dave Airlie" <airlied@gmail.com>,
	"Bagas Sanjaya" <bagasdotme@gmail.com>,
	andrey.grodzovsky@amd.com, tvrtko.ursulin@linux.intel.com,
	"Matthew Brost" <matthew.brost@intel.com>,
	yuq825@gmail.com,
	"Boris Brezillon" <boris.brezillon@collabora.com>,
	lina@asahilina.net, dri-devel@lists.freedesktop.org,
	linux-kernel@vger.kernel.org
Subject: Re: [Regression] drm/scheduler: track GPU active time per entity
Date: Thu, 6 Apr 2023 14:09:23 +0200	[thread overview]
Message-ID: <ZC62cxVD5xc31FEL@phenom.ffwll.local> (raw)
In-Reply-To: <9c72c7162da56234addd7083ec774e525a13957c.camel@pengutronix.de>

On Thu, Apr 06, 2023 at 12:45:12PM +0200, Lucas Stach wrote:
> Am Donnerstag, dem 06.04.2023 um 10:27 +0200 schrieb Daniel Vetter:
> > On Thu, 6 Apr 2023 at 10:22, Christian König <christian.koenig@amd.com> wrote:
> > > 
> > > Am 05.04.23 um 18:09 schrieb Luben Tuikov:
> > > > On 2023-04-05 10:05, Danilo Krummrich wrote:
> > > > > On 4/4/23 06:31, Luben Tuikov wrote:
> > > > > > On 2023-03-28 04:54, Lucas Stach wrote:
> > > > > > > Hi Danilo,
> > > > > > > 
> > > > > > > Am Dienstag, dem 28.03.2023 um 02:57 +0200 schrieb Danilo Krummrich:
> > > > > > > > Hi all,
> > > > > > > > 
> > > > > > > > Commit df622729ddbf ("drm/scheduler: track GPU active time per entity")
> > > > > > > > tries to track the accumulated time that a job was active on the GPU
> > > > > > > > writing it to the entity through which the job was deployed to the
> > > > > > > > scheduler originally. This is done within drm_sched_get_cleanup_job()
> > > > > > > > which fetches a job from the schedulers pending_list.
> > > > > > > > 
> > > > > > > > Doing this can result in a race condition where the entity is already
> > > > > > > > freed, but the entity's newly added elapsed_ns field is still accessed
> > > > > > > > once the job is fetched from the pending_list.
> > > > > > > > 
> > > > > > > > After drm_sched_entity_destroy() being called it should be safe to free
> > > > > > > > the structure that embeds the entity. However, a job originally handed
> > > > > > > > over to the scheduler by this entity might still reside in the
> > > > > > > > schedulers pending_list for cleanup after drm_sched_entity_destroy()
> > > > > > > > already being called and the entity being freed. Hence, we can run into
> > > > > > > > a UAF.
> > > > > > > > 
> > > > > > > Sorry about that, I clearly didn't properly consider this case.
> > > > > > > 
> > > > > > > > In my case it happened that a job, as explained above, was just picked
> > > > > > > > from the schedulers pending_list after the entity was freed due to the
> > > > > > > > client application exiting. Meanwhile this freed up memory was already
> > > > > > > > allocated for a subsequent client applications job structure again.
> > > > > > > > Hence, the new jobs memory got corrupted. Luckily, I was able to
> > > > > > > > reproduce the same corruption over and over again by just using
> > > > > > > > deqp-runner to run a specific set of VK test cases in parallel.
> > > > > > > > 
> > > > > > > > Fixing this issue doesn't seem to be very straightforward though (unless
> > > > > > > > I miss something), which is why I'm writing this mail instead of sending
> > > > > > > > a fix directly.
> > > > > > > > 
> > > > > > > > Spontaneously, I see three options to fix it:
> > > > > > > > 
> > > > > > > > 1. Rather than embedding the entity into driver specific structures
> > > > > > > > (e.g. tied to file_priv) we could allocate the entity separately and
> > > > > > > > reference count it, such that it's only freed up once all jobs that were
> > > > > > > > deployed through this entity are fetched from the schedulers pending list.
> > > > > > > > 
> > > > > > > My vote is on this or something in similar vain for the long term. I
> > > > > > > have some hope to be able to add a GPU scheduling algorithm with a bit
> > > > > > > more fairness than the current one sometime in the future, which
> > > > > > > requires execution time tracking on the entities.
> > > > > > Danilo,
> > > > > > 
> > > > > > Using kref is preferable, i.e. option 1 above.
> > > > > I think the only real motivation for doing that would be for generically
> > > > > tracking job statistics within the entity a job was deployed through. If
> > > > > we all agree on tracking job statistics this way I am happy to prepare a
> > > > > patch for this option and drop this one:
> > > > > https://lore.kernel.org/all/20230331000622.4156-1-dakr@redhat.com/T/#u
> > > > Hmm, I never thought about "job statistics" when I preferred using kref above.
> > > > The reason kref is attractive is because one doesn't need to worry about
> > > > it--when the last user drops the kref, the release is called to do
> > > > housekeeping. If this never happens, we know that we have a bug to debug.
> > > 
> > > Yeah, reference counting unfortunately have some traps as well. For
> > > example rarely dropping the last reference from interrupt context or
> > > with some unexpected locks help when the cleanup function doesn't expect
> > > that is a good recipe for problems as well.
> > > 
> Fully agreed.
> 
> > > > Regarding the patch above--I did look around the code, and it seems safe,
> > > > as per your analysis, I didn't see any reference to entity after job submission,
> > > > but I'll comment on that thread as well for the record.
> > > 
> > > Reference counting the entities was suggested before. The intentionally
> > > avoided that so far because the entity might be the tip of the iceberg
> > > of stuff you need to keep around.
> > > 
> > > For example for command submission you also need the VM and when you
> > > keep the VM alive you also need to keep the file private alive....
> > 
> > Yeah refcounting looks often like the easy way out to avoid
> > use-after-free issue, until you realize you've just made lifetimes
> > unbounded and have some enourmous leaks: entity keeps vm alive, vm
> > keeps all the bo alives, somehow every crash wastes more memory
> > because vk_device_lost means userspace allocates new stuff for
> > everything.
> > 
> > If possible a lifetime design where lifetimes have hard bounds and you
> > just borrow a reference under a lock (or some other ownership rule) is
> > generally much more solid. But also much harder to design correctly
> > :-/
> > 
> The use we are discussing here is to keep the entity alive as long as
> jobs from that entity are still active on the HW. While there are no
> hard bounds on when a job will get inactive, at least it's not
> unbounded. On a crash/fault the job will be removed from the hardware
> pretty soon.
> 
> Well behaved jobs after a application shutdown might take a little
> longer, but I don't really see the new problem with keeping the entity
> alive? As long as a job is active on the hardware, we can't throw out
> the VM or BOs, no difference whether the entity is kept alive or not.
> 
> Some hardware might have ways to expedite job inactivation by
> deactivating scheduling queues, or just taking a fault, but for some HW
> we'll just have to wait for the job to finish.

Shouldn't the scheduler's timed_out/tdr logic take care of these? It's
probably not good to block in something like the close(drmfd) or process
exit() for these, but it's all dma_fence underneath and those _must_
singal in finite time no matter what. So shouldn't be a deadlock problem,
but might still be a "userspace really doesn't like a big stall there"
problem.
-Daniel

> Regards,
> Lucas
> 
> > > Additional to that we have some ugly inter dependencies between tearing
> > > down an application (potential with a KILL signal from the OOM killer)
> > > and backward compatibility for some applications which render something
> > > and quit before the rendering is completed in the hardware.
> > 
> > Yeah I think that part would also be good to sort out once&for all in
> > drm/sched, because i915 has/had the same struggle.
> > -Daniel
> > 
> > > 
> > > Regards,
> > > Christian.
> > > 
> > > > 
> > > > Regards,
> > > > Luben
> > > > 
> > > > > Christian mentioned amdgpu tried something similar to what Lucas tried
> > > > > running into similar trouble, backed off and implemented it in another
> > > > > way - a driver specific way I guess?
> > > > > 
> > > > > > Lucas, can you shed some light on,
> > > > > > 
> > > > > > 1. In what way the current FIFO scheduling is unfair, and
> > > > > > 2. shed some details on this "scheduling algorithm with a bit
> > > > > > more fairness than the current one"?
> > > > > > 
> > > > > > Regards,
> > > > > > Luben
> > > > > > 
> > > > > > > > 2. Somehow make sure drm_sched_entity_destroy() does block until all
> > > > > > > > jobs deployed through this entity were fetched from the schedulers
> > > > > > > > pending list. Though, I'm pretty sure that this is not really desirable.
> > > > > > > > 
> > > > > > > > 3. Just revert the change and let drivers implement tracking of GPU
> > > > > > > > active times themselves.
> > > > > > > > 
> > > > > > > Given that we are already pretty late in the release cycle and etnaviv
> > > > > > > being the only driver so far making use of the scheduler elapsed time
> > > > > > > tracking I think the right short term solution is to either move the
> > > > > > > tracking into etnaviv or just revert the change for now. I'll have a
> > > > > > > look at this.
> > > > > > > 
> > > > > > > Regards,
> > > > > > > Lucas
> > > > > > > 
> > > > > > > > In the case of just reverting the change I'd propose to also set a jobs
> > > > > > > > entity pointer to NULL  once the job was taken from the entity, such
> > > > > > > > that in case of a future issue we fail where the actual issue resides
> > > > > > > > and to make it more obvious that the field shouldn't be used anymore
> > > > > > > > after the job was taken from the entity.
> > > > > > > > 
> > > > > > > > I'm happy to implement the solution we agree on. However, it might also
> > > > > > > > make sense to revert the change until we have a solution in place. I'm
> > > > > > > > also happy to send a revert with a proper description of the problem.
> > > > > > > > Please let me know what you think.
> > > > > > > > 
> > > > > > > > - Danilo
> > > > > > > > 
> > > 
> > 
> > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

next prev parent reply	other threads:[~2023-04-06 12:09 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-28  0:57 [Regression] drm/scheduler: track GPU active time per entity Danilo Krummrich
2023-03-28  8:54 ` Lucas Stach
2023-04-04  4:31   ` Luben Tuikov
2023-04-05 14:05     ` Danilo Krummrich
2023-04-05 16:09       ` Luben Tuikov
2023-04-05 16:23         ` Danilo Krummrich
2023-04-06  8:22         ` Christian König
2023-04-06  8:27           ` Daniel Vetter
2023-04-06  9:05             ` Asahi Lina
2023-04-06 10:09               ` Daniel Vetter
2023-04-06 12:21                 ` Asahi Lina
2023-04-06 13:37                   ` Daniel Vetter
2023-04-06 14:43                     ` Asahi Lina
2023-04-06 15:30                       ` Daniel Vetter
2023-04-06 17:11                         ` Asahi Lina
2023-04-06 17:42                           ` Daniel Vetter
2023-04-06 17:46                             ` Daniel Vetter
2023-04-06 10:45             ` Lucas Stach
2023-04-06 12:09               ` Daniel Vetter [this message]
2023-04-06 12:19                 ` Lucas Stach
2023-04-06 13:56                   ` Daniel Vetter
2023-04-06 14:21               ` Christian König
2023-04-06 15:24                 ` Lucas Stach
2023-04-06 15:33                   ` Christian König
2023-04-06 15:59                     ` Daniel Vetter
2023-04-06 16:23                       ` Daniel Vetter
2023-04-06 22:36               ` Luben Tuikov
2023-04-06 22:30           ` Luben Tuikov
2023-04-05 17:44     ` Lucas Stach
2023-04-05 20:44       ` Luben Tuikov
2023-04-06 15:58         ` Lucas Stach
2023-04-06 22:06           ` Luben Tuikov
     [not found] <e3e07282-136f-5239-96ae-4aeb2d3c95d7.ref@yahoo.de>
     [not found] ` <e3e07282-136f-5239-96ae-4aeb2d3c95d7@yahoo.de>
2023-03-28 10:29   ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZC62cxVD5xc31FEL@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=airlied@gmail.com \
    --cc=andrey.grodzovsky@amd.com \
    --cc=bagasdotme@gmail.com \
    --cc=boris.brezillon@collabora.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=l.stach@pengutronix.de \
    --cc=lina@asahilina.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luben.tuikov@amd.com \
    --cc=matthew.brost@intel.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    --cc=yuq825@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox