Re: [Intel-gfx] [Linaro-mm-sig] [RFC PATCH 1/2] dma-fence: Avoid establishing a locking order between fence classes

public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed

From: "Christian König" <christian.koenig@amd.com>
To: "Thomas Hellström (Intel)" <thomas_os@shipmail.org>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org, matthew.auld@intel.com
Subject: Re: [Intel-gfx] [Linaro-mm-sig] [RFC PATCH 1/2] dma-fence: Avoid establishing a locking order between fence classes
Date: Wed, 1 Dec 2021 12:25:09 +0100	[thread overview]
Message-ID: <a4df4d5f-ea74-8725-aca9-d0edae986e5c@amd.com> (raw)
In-Reply-To: <94435e0e-01db-5ae4-e424-64f73a09199f@shipmail.org>

Am 01.12.21 um 12:04 schrieb Thomas Hellström (Intel):
>
> On 12/1/21 11:32, Christian König wrote:
>> Am 01.12.21 um 11:15 schrieb Thomas Hellström (Intel):
>>> [SNIP]
>>>>
>>>> What we could do is to avoid all this by not calling the callback 
>>>> with the lock held in the first place.
>>>
>>> If that's possible that might be a good idea, pls also see below.
>>
>> The problem with that is 
>> dma_fence_signal_locked()/dma_fence_signal_timestamp_locked(). If we 
>> could avoid using that or at least allow it to drop the lock then we 
>> could call the callback without holding it.
>>
>> Somebody would need to audit the drivers and see if holding the lock 
>> is really necessary anywhere.
>>
>>>>
>>>>>>
>>>>>>>>
>>>>>>>> /Thomas
>>>>>>>
>>>>>>> Oh, and a follow up question:
>>>>>>>
>>>>>>> If there was a way to break the recursion on final put() (using 
>>>>>>> the same basic approach as patch 2 in this series uses to break 
>>>>>>> recursion in enable_signaling()), so that none of these 
>>>>>>> containers did require any special treatment, would it be worth 
>>>>>>> pursuing? I guess it might be possible by having the callbacks 
>>>>>>> drop the references rather than the loop in the final put. + a 
>>>>>>> couple of changes in code iterating over the fence pointers.
>>>>>>
>>>>>> That won't really help, you just move the recursion from the 
>>>>>> final put into the callback.
>>>>>
>>>>> How do we recurse from the callback? The introduced fence_put() of 
>>>>> individual fence pointers
>>>>> doesn't recurse anymore (at most 1 level), and any callback 
>>>>> recursion is broken by the irq_work?
>>>>
>>>> Yeah, but then you would need to take another lock to avoid racing 
>>>> with dma_fence_array_signaled().
>>>>
>>>>>
>>>>> I figure the big amount of work would be to adjust code that 
>>>>> iterates over the individual fence pointers to recognize that they 
>>>>> are rcu protected.
>>>>
>>>> Could be that we could solve this with RCU, but that sounds like a 
>>>> lot of churn for no gain at all.
>>>>
>>>> In other words even with the problems solved I think it would be a 
>>>> really bad idea to allow chaining of dma_fence_array objects.
>>>
>>> Yes, that was really the question, Is it worth pursuing this? I'm 
>>> not really suggesting we should allow this as an intentional 
>>> feature. I'm worried, however, that if we allow these containers to 
>>> start floating around cross-driver (or even internally) disguised as 
>>> ordinary dma_fences, they would require a lot of driver special 
>>> casing, or else completely unexpeced WARN_ON()s and lockdep splats 
>>> would start to turn up, scaring people off from using them. And that 
>>> would be a breeding ground for hairy driver-private constructs.
>>
>> Well the question is why we would want to do it?
>>
>> If it's to avoid inter driver lock dependencies by avoiding to call 
>> the callback with the spinlock held, then yes please. We had tons of 
>> problems with that, resulting in irq_work and work_item delegation 
>> all over the place.
>
> Yes, that sounds like something desirable, but in these containers, 
> what's causing the lock dependencies is the enable_signaling() 
> callback that is typically called locked.
>
>
>>
>> If it's to allow nesting of dma_fence_array instances, then it's most 
>> likely a really bad idea even if we fix all the locking order problems.
>
> Well I think my use-case where I hit a dead end may illustrate what 
> worries me here:
>
> 1) We use a dma-fence-array to coalesce all dependencies for ttm 
> object migration.
> 2) We use a dma-fence-chain to order the resulting dm_fence into a 
> timeline because the TTM resource manager code requires that.
>
> Initially seemingly harmless to me.
>
> But after a sequence evict->alloc->clear, the dma-fence-chain feeds 
> into the dma-fence-array for the clearing operation. Code still works 
> fine, and no deep recursion, no warnings. But if I were to add another 
> driver to the system that instead feeds a dma-fence-array into a 
> dma-fence-chain, this would give me a lockdep splat.
>
> So then if somebody were to come up with the splendid idea of using a 
> dma-fence-chain to initially coalesce fences, I'd hit the same problem 
> or risk illegaly joining two dma-fence-chains together.
>
> To fix this, I would need to look at the incoming fences and iterate 
> over any dma-fence-array or dma-fence-chain that is fed into the 
> dma-fence-array to flatten out the input. In fact all dma-fence-array 
> users would need to do that, and even dma-fence-chain users watching 
> out for not joining chains together or accidently add an array that 
> perhaps came as a disguised dma-fence from antother driver.
>
> So the purpose to me would be to allow these containers as input to 
> eachother without a lot of in-driver special-casing, be it by breaking 
> recursion on built-in flattening to avoid
>
> a) Hitting issues in the future or with existing interoperating drivers.
> b) Avoid driver-private containers that also might break the 
> interoperability. (For example the i915 currently driver-private 
> dma_fence_work avoid all these problems, but we're attempting to 
> address issues in common code rather than re-inventing stuff internally).

I don't think that a dma_fence_array or dma_fence_chain is the right 
thing to begin with in those use cases.

When you want to coalesce the dependencies for a job you could either 
use an xarray like Daniel did for the scheduler or some hashtable like 
we use in amdgpu. But I don't see the need for exposing the dma_fence 
interface for those.

And why do you use dma_fence_chain to generate a timeline for TTM? That 
should come naturally because all the moves must be ordered.

Regards,
Christian.

next prev parent reply	other threads:[~2021-12-01 11:25 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-30 12:19 [Intel-gfx] [RFC PATCH 0/2] Attempt to avoid dma-fence-[chain|array] lockdep splats Thomas Hellström
2021-11-30 12:19 ` [Intel-gfx] [RFC PATCH 1/2] dma-fence: Avoid establishing a locking order between fence classes Thomas Hellström
2021-11-30 12:25   ` Maarten Lankhorst
2021-11-30 12:31     ` Thomas Hellström
2021-11-30 12:42       ` Christian König
2021-11-30 12:56         ` Thomas Hellström
2021-11-30 13:26           ` Christian König
2021-11-30 14:35             ` Thomas Hellström
2021-11-30 15:02               ` Christian König
2021-11-30 18:12                 ` Thomas Hellström
2021-11-30 19:27                   ` Thomas Hellström
2021-12-01  7:05                     ` Christian König
2021-12-01  8:23                       ` [Intel-gfx] [Linaro-mm-sig] " Thomas Hellström (Intel)
2021-12-01  8:36                         ` Christian König
2021-12-01 10:15                           ` Thomas Hellström (Intel)
2021-12-01 10:32                             ` Christian König
2021-12-01 11:04                               ` Thomas Hellström (Intel)
2021-12-01 11:25                                 ` Christian König [this message]
2021-12-01 12:16                                   ` Thomas Hellström (Intel)
2021-12-03 13:08                                     ` Christian König
2021-12-03 14:18                                       ` Thomas Hellström
2021-12-03 14:26                                         ` Christian König
2021-12-03 14:50                                           ` Thomas Hellström
2021-12-03 15:00                                             ` Christian König
2021-12-03 15:13                                               ` Thomas Hellström (Intel)
2021-12-07 18:08                                         ` Daniel Vetter
2021-12-07 20:46                                           ` Thomas Hellström
2021-12-20  9:37                                             ` Daniel Vetter
2021-11-30 12:32   ` [Intel-gfx] " Thomas Hellström
2021-11-30 12:19 ` [Intel-gfx] [RFC PATCH 2/2] dma-fence: Avoid excessive recursive fence locking from enable_signaling() callbacks Thomas Hellström
2021-11-30 12:36 ` [Intel-gfx] [RFC PATCH 0/2] Attempt to avoid dma-fence-[chain|array] lockdep splats Christian König
2021-11-30 13:05 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
2021-11-30 13:48 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-11-30 17:47 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a4df4d5f-ea74-8725-aca9-d0edae986e5c@amd.com \
    --to=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.auld@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=thomas_os@shipmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox