Re: [PATCH RFC 10/18] drm/scheduler: Add can_run_job callback

public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed

From: "Christian König" <christian.koenig@amd.com>
To: "Asahi Lina" <lina@asahilina.net>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"David Airlie" <airlied@gmail.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Miguel Ojeda" <ojeda@kernel.org>,
	"Alex Gaynor" <alex.gaynor@gmail.com>,
	"Wedson Almeida Filho" <wedsonaf@gmail.com>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Luben Tuikov" <luben.tuikov@amd.com>,
	"Jarkko Sakkinen" <jarkko@kernel.org>,
	"Dave Hansen" <dave.hansen@linux.intel.com>
Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io>,
	Karol Herbst <kherbst@redhat.com>,
	Ella Stanforth <ella@iglunix.org>,
	Faith Ekstrand <faith.ekstrand@collabora.com>,
	Mary <mary@mary.zone>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	rust-for-linux@vger.kernel.org, linux-media@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org, linux-sgx@vger.kernel.org,
	asahi@lists.linux.dev
Subject: Re: [PATCH RFC 10/18] drm/scheduler: Add can_run_job callback
Date: Wed, 8 Mar 2023 20:12:38 +0100	[thread overview]
Message-ID: <d1fccceb-ca77-f653-17fc-63168e0da884@amd.com> (raw)
In-Reply-To: <c0624252-070e-bd44-2116-93a1d63a1359@asahilina.net>

Am 08.03.23 um 20:05 schrieb Asahi Lina:
> [SNIP]
>> Well it's not the better way, it's the only way that works.
>>
>> I have to admit that my bet on your intentions was wrong, but even that
>> use case doesn't work correctly.
>>
>> See when your callback returns false it is perfectly possible that all
>> hw fences are signaled between returning that information and processing it.
>>
>> The result would be that the scheduler goes to sleep and never wakes up
>> again.
> That can't happen, because it will just go into another iteration of the
> drm_sched main loop since there is an entity available still.
>
> Rather there is probably the opposite bug in this patch: the can_run_job
> logic should be moved into the wait_event_interruptible() condition
> check, otherwise I think it can end up busy-looping since the condition
> itself can be true even when the can_run_job check blocks it.
>
> But there is no risk of it going to sleep and never waking up because
> job completions will wake up the waitqueue by definition, and that
> happens after the driver-side queues are popped. If this problem could
> happen, then the existing hw_submission_limit logic would be broken in
> the same way. It is logically equivalent in how it works.
>
> Basically, if properly done in wait_event_interruptible, it is exactly
> the logic of that macro that prevents this race condition and makes
> everything work at all. Without it, drm_sched would be completely broken.
>
>> As I said we exercised those ideas before and yes this approach here
>> came up before as well and no it doesn't work.
> It can never deadlock with this patch as it stands (though it could busy
> loop), and if properly moved into the wait_event_interruptible(), it
> would also never busy loop and work entirely as intended. The actual API
> change is sound.
>
> I don't know why you're trying so hard to convince everyone that this
> approach is fundamentally broken... It might be a bad idea for other
> reasons, it might encourage incorrect usage, it might not be the best
> option, there are plenty of arguments you can make... but you just keep
> trying to make an argument that it just can't work at all for some
> reason. Why? I already said I'm happy dropping it in favor of the fences...

Well because it is broken.

When you move the check into the wait_event_interruptible condition then 
who is going to call wait_event_interruptible when the condition changes?

As I said this idea came up before and was rejected multiple times.

Regards,
Christian.

>
> It's intended to mirror the hw_submission_limit logic. If you think this
> is broken, then that's broken too. They are equivalent mechanisms.
>
>>> This particular issue aside, fairness in global resource allocation is a
>>> conversation I'd love to have! Right now the driver doesn't try to
>>> ensure that, a queue can easily monopolize certain hardware resources
>>> (though one queue can only monopolize one of each, so you'd need
>>> something like 63 queues with 63 distinct VMs all submitting
>>> free-running jobs back to back in order to starve other queues of
>>> resources forever). For starters, one thing I'm thinking of doing is
>>> reserving certain subsets of hardware resources for queues with a given
>>> priority, so you can at least guarantee forward progress of
>>> higher-priority queues when faced with misbehaving lower-priority
>>> queues. But if we want to guarantee proper fairness, I think I'll have
>>> to start doing things like switching to a CPU-roundtrip submission model
>>> when resources become scarce (to guarantee that queues actually release
>>> the resources once in a while) and then figure out how to add fairness
>>> to the allocation code...
>>>
>>> But let's have that conversation when we talk about the driver (or maybe
>>> on IRC or something?), right now I'm more interested in getting the
>>> abstractions reviewed ^^
>> Well that stuff is highly problematic as well. The fairness aside you
>> risk starvation which in turn breaks the guarantee of forward progress.
>>
>> In this particular case you can catch this with a timeout for the hw
>> operation, but you should consider blocking that from the sw side as well.
> In the current state I actually think it's not really that problematic,
> because the resources are acquired directly in the ioctl path. So that
> can block if starved, but if that can cause overall forward progress to
> stop because some fence doesn't get signaled, then so can just not doing
> the ioctl in the first place, so there's not much point (userspace can
> always misbehave with its fence usage...). By the time anything gets
> submitted to drm_sched, the resources are already guaranteed to be
> acquired, we never block in the run callback.
>
> It needs to be fixed of course, but if the threat model is a malicious
> GPU process, well, there are many other ways to DoS your system... and I
> don't think it's very likely that 63+ queues (which usually means 63+
> processes with OpenGL) will end up accidentally starving the GPU in a
> tight loop at the same time. I'd love to hear about real-world scenarios
> where this kind of thing has been a real problem and not just a
> theoretical one though... maybe I'm missing something?
>
> Basically my priorities with the driver are:
>
> 1. Make sure it never crashes
> 2. Make sure it works well for real users
> 3. Make it work smoothly for real users under reasonable load
> (priorities, CPU scheduler interactions, etc.)
> 4. Make it handle accidental problems more gracefully (OOMs etc, I need
> to look into private GEM BO accounting to processes so the OOM killer
> has better data to work with)
> 5. Make it more robust against deliberate abuse/starvation (this should
> matter more once we have some kind of paravirtualization solution...)
>
> And right now we're somewhere between 2 and 3. So if there are cases
> where this resource acquisition stuff can cause a problem for real
> users, I'll want to fix it earlier. But if this is more theoretical than
> anything (with the resource limits of AGX GPUs), I'd rather focus on
> things like memory accounting and shrinker support first.
>
>>> We don't even have a shrinker yet, and I'm sure that's going to be a lot
>>> of fun when we add it too... but yes, if we can't do any memory
>>> allocations in some of these callbacks (is this documented anywhere?),
>>> that's going to be interesting...
>> Yes, that is all part of the dma_fence documentation. It's just
>> absolutely not obvious what all this means.
> I mean is there any documentation on how this interacts with drm_sched?
> Like, am I not allowed to allocate memory in prepare()? What about
> run()? What about GPU interrupt work? (not a raw IRQ handler context, I
> mean the execution path from GPU IRQ to drm_sched run() fences getting
> signaled)
>
>>> It's not all bad news though! All memory allocations are fallible in
>>> kernel Rust (and therefore explicit, and also failures have to be
>>> explicitly handled or propagated), so it's pretty easy to point out
>>> where they are, and there are already discussions of higher-level
>>> tooling to enforce rules like that (and things like wait contexts).
>>> Also, Rust makes it a lot easier to refactor code in general and not be
>>> scared that you're going to regress everything, so I'm not really
>>> worried if I need to turn a chunk of the driver on its head to solve
>>> some of these problems in the future ^^ (I already did that when I
>>> switched it from the "demo" synchronous submission model to the proper
>>> explicit sync + fences one.)
>> Yeah, well the problem isn't that you run into memory allocation failure.
> What I mean is that the mandatory failure handling means it's relatively
> easy to audit where memory allocations can actually happen.
>
>> The problem is rather something like this:
>> 1. You try to allocate memory to signal your fence.
>> 2. This memory allocation can't be fulfilled and goes to sleep to wait
>> for reclaim.
>> 3. On another CPU reclaim is running and through the general purpose
>> shrinker, page fault or MMU notifier ends up wait for your dma_fence.
>>
>> You don't even need to implement the shrinker for this to go boom
>> extremely easy.
> Hmm, can you actually get something waiting on a dma_fence like that
> today with this driver? We don't have a shrinker, we don't have
> synchronous page faults or MMU notifications for the GPU, and this is
> explicit sync so all in/out fences cross over into userspace so surely
> they can't be trusted anyway?
>
> I'm definitely not familiar with the intricacies of DMA fences and how
> they interact with everything else yet, but it's starting to sound like
> either this isn't quite broken for our simple driver yet, or it must be
> broken pretty much everywhere in some way...
>
>> So everything involved with signaling the fence can allocate memory only
>> with GFP_ATOMIC and only if you absolutely have to.
> I don't think we even have a good story for passing around gfp_flags in
> Rust code so that will be interesting... though I need to actually audit
> the code paths and see how many allocations we really do. I know I alloc
> some vectors for holding completed commands and stuff like that, but I'm
> pretty sure I can fix that one with some reworking, and I'm not sure how
> many other random things there really are...? Obviously most allocations
> happen at command creation time, on completion you mostly get a lot of
> freeing, so maybe I can just eliminate all allocs and not worry about
> GFP_ATOMIC.
>
> ~~ Lina

next prev parent reply	other threads:[~2023-03-08 19:13 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-07 14:25 [PATCH RFC 00/18] Rust DRM subsystem abstractions (& preview AGX driver) Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 01/18] rust: drm: ioctl: Add DRM ioctl abstraction Asahi Lina
2023-03-07 14:48   ` Karol Herbst
2023-03-07 14:51     ` Karol Herbst
2023-03-07 15:32   ` Maíra Canal
2023-03-09  5:32     ` Asahi Lina
2023-03-09  6:15       ` Dave Airlie
2023-03-09 12:09         ` Maíra Canal
2023-03-07 17:34   ` Björn Roy Baron
2023-03-09  6:04     ` Asahi Lina
2023-03-09 20:24       ` Faith Ekstrand
2023-03-09 20:39         ` Karol Herbst
2023-03-10  6:21           ` Asahi Lina
2023-04-13  9:23   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 02/18] rust: drm: Add Device and Driver abstractions Asahi Lina
2023-03-07 18:19   ` Björn Roy Baron
2023-03-09  6:10     ` Asahi Lina
2023-03-10 18:56   ` Boqun Feng
2023-03-11  5:41   ` Boqun Feng
2023-04-05 17:10   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 03/18] rust: drm: file: Add File abstraction Asahi Lina
2023-03-09 21:16   ` Faith Ekstrand
2023-03-09 22:16     ` Asahi Lina
2023-03-13 17:49       ` Faith Ekstrand
2023-03-14  2:07         ` Boqun Feng
2023-04-05 11:25           ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 04/18] rust: drm: gem: Add GEM object abstraction Asahi Lina
2023-04-05 11:08   ` Daniel Vetter
2023-04-05 11:19     ` Miguel Ojeda
2023-04-05 11:22       ` Daniel Vetter
2023-04-05 12:32         ` Miguel Ojeda
2023-04-05 12:36           ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 05/18] drm/gem-shmem: Export VM ops functions Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 06/18] rust: drm: gem: shmem: Add DRM shmem helper abstraction Asahi Lina
2023-03-08 13:38   ` Maíra Canal
2023-03-09  5:25     ` Asahi Lina
2023-03-09 11:47       ` Maíra Canal
2023-03-09 14:16         ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 07/18] rust: drm: mm: Add DRM MM Range Allocator abstraction Asahi Lina
2023-04-06 14:15   ` Daniel Vetter
2023-04-06 15:28     ` Miguel Ojeda
2023-04-06 15:45       ` Daniel Vetter
2023-04-06 17:19         ` Miguel Ojeda
2023-04-06 15:53     ` Asahi Lina
2023-04-06 16:13       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 16:39         ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 08/18] rust: dma_fence: Add DMA Fence abstraction Asahi Lina
2023-04-05 11:10   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 09/18] rust: drm: syncobj: Add DRM Sync Object abstraction Asahi Lina
2023-04-05 12:33   ` Daniel Vetter
2023-04-06 16:04     ` Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 10/18] drm/scheduler: Add can_run_job callback Asahi Lina
2023-03-08  8:46   ` Christian König
2023-03-08  9:41     ` Asahi Lina
2023-03-08 10:00       ` Christian König
2023-03-08 14:53         ` Asahi Lina
2023-03-08 15:30           ` Christian König
2023-03-08 16:44             ` Asahi Lina
2023-03-08 17:57               ` Christian König
2023-03-08 19:05                 ` Asahi Lina
2023-03-08 19:12                   ` Christian König [this message]
2023-03-08 19:45                     ` Asahi Lina
2023-03-08 20:14                       ` Christian König
2023-03-09  6:30                         ` Asahi Lina
2023-03-09  8:05                           ` Christian König
2023-03-09  9:14                             ` Asahi Lina
2023-03-09 18:50                               ` Faith Ekstrand
2023-03-10  9:16                                 ` Asahi Lina
2023-03-08 12:39     ` Karol Herbst
2023-03-08 13:47       ` Christian König
2023-03-08 14:43         ` Karol Herbst
2023-03-08 15:02           ` Christian König
2023-03-08 15:19             ` Karol Herbst
2023-03-16 13:40               ` Daniel Vetter
2023-04-05 13:40   ` Daniel Vetter
2023-04-05 14:14     ` Christian König
2023-04-05 14:21       ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 11/18] drm/scheduler: Clean up jobs when the scheduler is torn down Asahi Lina
2023-03-08  9:57   ` Maarten Lankhorst
2023-03-08 10:03     ` Christian König
2023-03-08 15:18       ` Asahi Lina
2023-03-08 15:42         ` Christian König
2023-03-08 17:32           ` Asahi Lina
2023-03-08 17:39             ` alyssa
2023-03-08 17:44               ` Asahi Lina
2023-03-08 18:13               ` Christian König
2023-03-08 18:12             ` Christian König
2023-03-08 19:37               ` Asahi Lina
2023-03-09  8:42                 ` Christian König
2023-03-09  9:43                   ` Asahi Lina
2023-03-09 11:47                     ` Christian König
2023-03-09 13:48                       ` Asahi Lina
2023-03-09 19:59                     ` Faith Ekstrand
2023-03-10  9:58                       ` Asahi Lina
2023-03-13 20:11                         ` Faith Ekstrand
2023-04-05 13:52   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 12/18] rust: drm: sched: Add GPU scheduler abstraction Asahi Lina
2023-04-05 15:43   ` Daniel Vetter
2023-04-05 19:29     ` Daniel Vetter
2023-04-18  8:45       ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 13/18] drm/gem: Add a flag to control whether objects can be exported Asahi Lina
2023-04-05 14:55   ` Daniel Vetter
2023-03-07 14:25 ` [PATCH RFC 14/18] rust: drm: gem: Add set_exportable() method Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 15/18] drm/asahi: Add the Asahi driver UAPI [DO NOT MERGE] Asahi Lina
2023-03-07 15:28   ` Karol Herbst
2023-03-07 14:25 ` [PATCH RFC 16/18] rust: bindings: Bind the Asahi DRM UAPI Asahi Lina
2023-03-07 14:25 ` [PATCH RFC 17/18] rust: macros: Add versions macro Asahi Lina
2023-03-07 16:17 ` [PATCH RFC 00/18] Rust DRM subsystem abstractions (& preview AGX driver) Asahi Lina
     [not found] ` <20230307-rust-drm-v1-18-917ff5bc80a8@asahilina.net>
2023-04-05 14:44   ` [PATCH RFC 18/18] drm/asahi: Add the Asahi driver for Apple AGX GPUs Daniel Vetter
2023-04-06  5:02     ` Asahi Lina
2023-04-06  5:09       ` Asahi Lina
2023-04-06 11:25       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 13:32         ` Asahi Lina
2023-04-06 13:54           ` Daniel Vetter
     [not found]   ` <ZC2HtBOaoUAzVCVH@phenom.ffwll.local>
2023-04-06  4:44     ` Asahi Lina
2023-04-06  5:09       ` Asahi Lina
2023-04-06 11:26         ` Daniel Vetter
2023-04-06 10:42       ` [Linaro-mm-sig] " Daniel Vetter
2023-04-06 11:55       ` Daniel Vetter
2023-04-06 13:15         ` Asahi Lina
2023-04-06 13:48           ` Daniel Vetter
2023-04-06 15:19             ` Asahi Lina

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d1fccceb-ca77-f653-17fc-63168e0da884@amd.com \
    --to=christian.koenig@amd.com \
    --cc=airlied@gmail.com \
    --cc=alex.gaynor@gmail.com \
    --cc=alyssa@rosenzweig.io \
    --cc=asahi@lists.linux.dev \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=daniel@ffwll.ch \
    --cc=dave.hansen@linux.intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ella@iglunix.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=gary@garyguo.net \
    --cc=jarkko@kernel.org \
    --cc=kherbst@redhat.com \
    --cc=lina@asahilina.net \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=luben.tuikov@amd.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mary@mary.zone \
    --cc=mripard@kernel.org \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=tzimmermann@suse.de \
    --cc=wedsonaf@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox