Re: [Intel-xe] [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: "Daniel Vetter" <daniel@ffwll.ch>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Christian König" <christian.koenig@amd.com>
Cc: robdclark@chromium.org, airlied@linux.ie, lina@asahilina.net,
	"Thomas Hellström (Intel)" <thomas_os@shipmail.org>,
	dri-devel@lists.freedesktop.org,
	"Christian König" <christian.koenig@amd.com>,
	boris.brezillon@collabora.com, intel-xe@lists.freedesktop.org,
	faith.ekstrand@collabora.com
Subject: Re: [Intel-xe] [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences
Date: Wed, 5 Apr 2023 14:35:46 +0200	[thread overview]
Message-ID: <cece14bb-d3c5-3772-855e-bac0bb7766d7@linux.intel.com> (raw)
In-Reply-To: <ZCx5wGTA3RCZGjA4@phenom.ffwll.local>

Hi,

On 4/4/23 21:25, Daniel Vetter wrote:
> On Tue, Apr 04, 2023 at 07:02:23PM +0000, Matthew Brost wrote:
>> On Tue, Apr 04, 2023 at 08:14:01PM +0200, Thomas Hellström (Intel) wrote:
>>> On 4/4/23 15:10, Christian König wrote:
>>>> Am 04.04.23 um 14:54 schrieb Thomas Hellström:
>>>>> Hi, Christian,
>>>>>
>>>>> On 4/4/23 11:09, Christian König wrote:
>>>>>> Am 04.04.23 um 02:22 schrieb Matthew Brost:
>>>>>>> From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>>>>>
>>>>>>> For long-running workloads, drivers either need to open-code
>>>>>>> completion
>>>>>>> waits, invent their own synchronization primitives or internally use
>>>>>>> dma-fences that do not obey the cross-driver dma-fence protocol, but
>>>>>>> without any lockdep annotation all these approaches are error prone.
>>>>>>>
>>>>>>> So since for example the drm scheduler uses dma-fences it is
>>>>>>> desirable for
>>>>>>> a driver to be able to use it for throttling and error
>>>>>>> handling also with
>>>>>>> internal dma-fences tha do not obey the cros-driver
>>>>>>> dma-fence protocol.
>>>>>>>
>>>>>>> Introduce long-running completion fences in form of
>>>>>>> dma-fences, and add
>>>>>>> lockdep annotation for them. In particular:
>>>>>>>
>>>>>>> * Do not allow waiting under any memory management locks.
>>>>>>> * Do not allow to attach them to a dma-resv object.
>>>>>>> * Introduce a new interface for adding callbacks making the
>>>>>>> helper adding
>>>>>>>     a callback sign off on that it is aware that the dma-fence may not
>>>>>>>     complete anytime soon. Typically this will be the
>>>>>>> scheduler chaining
>>>>>>>     a new long-running fence on another one.
>>>>>> Well that's pretty much what I tried before:
>>>>>> https://lwn.net/Articles/893704/
>>>>>>
>> I don't think this quite the same, this explictly enforces that we don't
>> break the dma-fence rules (in path of memory allocations, exported in
>> any way), essentially this just SW sync point reusing dma-fence the
>> infrastructure for signaling / callbacks. I believe your series tried to
>> export these fences to user space (admittedly I haven't fully read your
>> series).
>>
>> In this use case we essentially just want to flow control the ring via
>> the dma-scheduler + maintain a list of pending jobs so the TDR can be
>> used for cleanup if LR entity encounters an error. To me this seems
>> perfectly reasonable but I know dma-femce rules are akin to a holy war.
>>
>> If we return NULL in run_job, now we have to be able to sink all jobs
>> in the backend regardless on ring space, maintain a list of jobs pending
>> for cleanup after errors, and write a different cleanup path as now the
>> TDR doesn't work. Seems very, very silly to duplicate all of this code
>> when the DRM scheduler provides all of this for us. Also if we go this
>> route, now all drivers are going to invent ways to handle LR jobs /w the
>> DRM scheduler.
>>
>> This solution is pretty clear, mark the scheduler as LR, and don't
>> export any fences from the scheduler. If you try to export these fences
>> a blow up happens.
> The problem is if you mix things up. Like for resets you need all the
> schedulers on an engine/set-of-engines to quiescent or things get
> potentially hilarious. If you now have a scheduler in forever limbo, the
> dma_fence guarantees are right out the window.
>
> But the issue you're having is fairly specific if it's just about
> ringspace. I think the dumbest fix is to just block in submit if you run
> out of per-ctx ringspace, and call it a day. This notion that somehow the
> kernel is supposed to provide a bottomless queue of anything userspace
> submits simply doesn't hold up in reality (as much as userspace standards
> committees would like it to), and as long as it doesn't have a real-world
> perf impact it doesn't really matter why we end up blocking in the submit
> ioctl. It might also be a simple memory allocation that hits a snag in
> page reclaim.

So it seems the discussion around the long-running synchronization 
diverged a bit between threads and this thread was hijacked for 
preempt-fences and userptr.

Do I understand it correctly that the recommendation from both Daniel 
and Christian is to *not* use the drm scheduler for long-running compute 
jobs, but track any internal dma-fence dependencies (pipelined clearing 
or whatever) in a separate mechanism and handle unresolved dependencies 
on other long-running jobs using -EWOULDBLOCK?

Thanks,
Thomas





>>>>>> And the reasons why it was rejected haven't changed.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>> Yes, TBH this was mostly to get discussion going how we'd best
>>>>> tackle this problem while being able to reuse the scheduler for
>>>>> long-running workloads.
>>>>>
>>>>> I couldn't see any clear decision on your series, though, but one
>>>>> main difference I see is that this is intended for driver-internal
>>>>> use only. (I'm counting using the drm_scheduler as a helper for
>>>>> driver-private use). This is by no means a way to try tackle the
>>>>> indefinite fence problem.
>>>> Well this was just my latest try to tackle this, but essentially the
>>>> problems are the same as with your approach: When we express such
>>>> operations as dma_fence there is always the change that we leak that
>>>> somewhere.
>>>>
>>>> My approach of adding a flag noting that this operation is dangerous and
>>>> can't be synced with something memory management depends on tried to
>>>> contain this as much as possible, but Daniel still pretty clearly
>>>> rejected it (for good reasons I think).
>>>>
>>>>> We could ofc invent a completely different data-type that abstracts
>>>>> the synchronization the scheduler needs in the long-running case, or
>>>>> each driver could hack something up, like sleeping in the
>>>>> prepare_job() or run_job() callback for throttling, but those waits
>>>>> should still be annotated in one way or annotated one way or another
>>>>> (and probably in a similar way across drivers) to make sure we don't
>>>>> do anything bad.
>>>>>
>>>>>   So any suggestions as to what would be the better solution here
>>>>> would be appreciated.
>>>> Mhm, do we really the the GPU scheduler for that?
>>>>
>> I think we need to solve this within the DRM scheduler one way or
>> another.
> Yeah so if we conclude that the queue really must be bottomless then I
> agree drm-sched should help out sort out the mess. Because I'm guessing
> that every driver will have this issue. But that's a big if.
>
> I guess if we teach the drm scheduler that some jobs are fairly endless
> then maybe it wouldn't be too far-fetched to also teach it to wait for a
> previous one to finish (but not with the dma_fence that preempts, which we
> put into the dma_resv for memory management, but some other struct
> completion). The scheduler already has a concept of not stuffing too much
> stuff into the same queue after all, so this should fit?
> -Daniel
>
>
>>>> I mean in the 1 to 1 case  you basically just need a component which
>>>> collects the dependencies as dma_fence and if all of them are fulfilled
>>>> schedules a work item.
>>>>
>>>> As long as the work item itself doesn't produce a dma_fence it can then
>>>> still just wait for other none dma_fence dependencies.
>>>>
>>>> Then the work function could submit the work and wait for the result.
>>>>
>>>> The work item would then pretty much represent what you want, you can
>>>> wait for it to finish and pass it along as long running dependency.
>>>>
>>>> Maybe give it a funky name and wrap it up in a structure, but that's
>>>> basically it.
>>>>
>>> This very much sounds like a i915_sw_fence for the dependency tracking and
>>> dma_fence_work for the actual work although it's completion fence is a
>>> dma_fence.
>>>
>> Agree this does sound to i915ish as stated below one of mandates in Xe
>> was to use the DRM scheduler. Beyond that as someone who a submission
>> backend in the i915 and Xe, I love how the DRM scheduler works (single
>> entry point), it makes everything so much easier.
>>
>> Matt
>>
>>> Although that goes against the whole idea of a condition for merging the xe
>>> driver would be that we implement some sort of minimal scaffolding for
>>> long-running workloads in the drm scheduler, and the thinking behind that is
>>> to avoid implementing intel-specific solutions like those...
>>>
>>> Thanks,
>>>
>>> Thomas
>>>
>>>
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Thanks,
>>>>>
>>>>> Thomas
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>

next prev parent reply	other threads:[~2023-04-05 12:35 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-04  0:22 [Intel-xe] [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 01/10] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
2023-06-09  6:58   ` Boris Brezillon
2023-07-31  0:56     ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 02/10] drm/sched: Move schedule policy to scheduler / entity Matthew Brost
2023-04-05 17:37   ` Luben Tuikov
2023-04-05 18:29     ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 03/10] drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 04/10] drm/sched: Add generic scheduler message interface Matthew Brost
2023-05-04  5:28   ` Luben Tuikov
2023-07-31  2:42     ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 05/10] drm/sched: Start run wq before TDR in drm_sched_start Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 06/10] drm/sched: Submit job before starting TDR Matthew Brost
2023-05-04  5:23   ` Luben Tuikov
2023-07-31  1:00     ` Matthew Brost
2023-07-31  7:26       ` Boris Brezillon
2023-08-31 19:48         ` Luben Tuikov
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 07/10] drm/sched: Add helper to set TDR timeout Matthew Brost
2023-05-04  5:28   ` Luben Tuikov
2023-07-31  1:09     ` Matthew Brost
2023-08-31 19:52       ` Luben Tuikov
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences Matthew Brost
2023-04-04  9:09   ` Christian König
2023-04-04 12:54     ` Thomas Hellström
2023-04-04 13:10       ` Christian König
2023-04-04 18:14         ` Thomas Hellström (Intel)
2023-04-04 19:02           ` Matthew Brost
2023-04-04 19:25             ` Daniel Vetter
2023-04-04 19:48               ` Matthew Brost
2023-04-05 13:09                 ` Daniel Vetter
2023-04-05 23:58                   ` Matthew Brost
2023-04-06  6:32                     ` Daniel Vetter
2023-04-06 16:58                       ` Matthew Brost
2023-04-06 17:09                         ` Daniel Vetter
2023-04-05 12:35               ` Thomas Hellström [this message]
2023-04-05 12:39                 ` Christian König
2023-04-05 12:45                   ` Daniel Vetter
2023-04-05 14:08                     ` Christian König
2023-04-04 19:00         ` Daniel Vetter
2023-04-04 20:03           ` Matthew Brost
2023-04-04 20:11             ` Daniel Vetter
2023-04-04 20:19               ` Matthew Brost
2023-04-04 20:31                 ` Daniel Vetter
2023-04-04 20:46                   ` Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 09/10] drm/sched: Support long-running sched entities Matthew Brost
2023-04-04  0:22 ` [Intel-xe] [RFC PATCH 10/10] drm/syncobj: Warn on long running dma-fences Matthew Brost
2023-04-04  0:24 ` [Intel-xe] ✗ CI.Patch_applied: failure for Xe DRM scheduler and long running workload plans Patchwork
2023-04-04  1:07 ` [Intel-xe] [RFC PATCH 00/10] " Asahi Lina
2023-04-04  1:58   ` Matthew Brost
2023-04-08  7:05     ` Asahi Lina
2023-04-11 14:07       ` Daniel Vetter
2023-04-12  5:47         ` Asahi Lina
2023-04-12  8:18           ` Daniel Vetter
2023-04-17  0:03       ` Matthew Brost
2023-04-04  9:04 ` Christian König
2023-04-04 13:23   ` Matthew Brost
2023-04-04  9:13 ` Christian König
2023-04-04 13:37   ` Matthew Brost
2023-04-05  7:41     ` Christian König
2023-04-05  8:34       ` Daniel Vetter
2023-04-05  8:53         ` Christian König
2023-04-05  9:07           ` Daniel Vetter
2023-04-05  9:57             ` Christian König
2023-04-05 10:12               ` Daniel Vetter
2023-04-06  2:08                 ` Matthew Brost
2023-04-06  6:37                   ` Daniel Vetter
2023-04-06 10:14                     ` Christian König
2023-04-06 10:32                       ` Daniel Vetter
2023-04-04  9:43 ` Tvrtko Ursulin
2023-04-04  9:48   ` Christian König
2023-04-04 13:43     ` Matthew Brost
2023-04-04 13:52   ` Matthew Brost
2023-04-04 17:29     ` Tvrtko Ursulin
2023-04-04 19:07       ` Daniel Vetter
2023-04-04 18:02 ` Zeng, Oak
2023-04-04 18:08   ` Matthew Brost
2023-04-05  7:30     ` Christian König
2023-04-05  8:42       ` Daniel Vetter
2023-04-05 18:06       ` Zeng, Oak
2023-04-05 18:53         ` Matthew Brost
2023-04-06 10:04           ` Christian König
2023-04-07  0:20           ` Zeng, Oak
2023-04-11  9:02             ` Christian König
2023-04-11 14:13               ` Daniel Vetter
2023-04-17  6:47                 ` Christian König
2023-04-17  8:39                   ` Daniel Vetter
2023-04-18 15:10 ` Liviu Dudau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cece14bb-d3c5-3772-855e-bac0bb7766d7@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=airlied@linux.ie \
    --cc=boris.brezillon@collabora.com \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lina@asahilina.net \
    --cc=matthew.brost@intel.com \
    --cc=robdclark@chromium.org \
    --cc=thomas_os@shipmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox