From: "Thomas Hellström (Intel)" <thomas_os@shipmail.org>
To: "Christian König" <christian.koenig@amd.com>,
"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org, matthew.auld@intel.com
Subject: Re: [Intel-gfx] [Linaro-mm-sig] [RFC PATCH 1/2] dma-fence: Avoid establishing a locking order between fence classes
Date: Wed, 1 Dec 2021 12:04:24 +0100 [thread overview]
Message-ID: <94435e0e-01db-5ae4-e424-64f73a09199f@shipmail.org> (raw)
In-Reply-To: <4805074d-7039-3eaf-eb5d-5797278b7f31@amd.com>
On 12/1/21 11:32, Christian König wrote:
> Am 01.12.21 um 11:15 schrieb Thomas Hellström (Intel):
>> [SNIP]
>>>
>>> What we could do is to avoid all this by not calling the callback
>>> with the lock held in the first place.
>>
>> If that's possible that might be a good idea, pls also see below.
>
> The problem with that is
> dma_fence_signal_locked()/dma_fence_signal_timestamp_locked(). If we
> could avoid using that or at least allow it to drop the lock then we
> could call the callback without holding it.
>
> Somebody would need to audit the drivers and see if holding the lock
> is really necessary anywhere.
>
>>>
>>>>>
>>>>>>>
>>>>>>> /Thomas
>>>>>>
>>>>>> Oh, and a follow up question:
>>>>>>
>>>>>> If there was a way to break the recursion on final put() (using
>>>>>> the same basic approach as patch 2 in this series uses to break
>>>>>> recursion in enable_signaling()), so that none of these
>>>>>> containers did require any special treatment, would it be worth
>>>>>> pursuing? I guess it might be possible by having the callbacks
>>>>>> drop the references rather than the loop in the final put. + a
>>>>>> couple of changes in code iterating over the fence pointers.
>>>>>
>>>>> That won't really help, you just move the recursion from the final
>>>>> put into the callback.
>>>>
>>>> How do we recurse from the callback? The introduced fence_put() of
>>>> individual fence pointers
>>>> doesn't recurse anymore (at most 1 level), and any callback
>>>> recursion is broken by the irq_work?
>>>
>>> Yeah, but then you would need to take another lock to avoid racing
>>> with dma_fence_array_signaled().
>>>
>>>>
>>>> I figure the big amount of work would be to adjust code that
>>>> iterates over the individual fence pointers to recognize that they
>>>> are rcu protected.
>>>
>>> Could be that we could solve this with RCU, but that sounds like a
>>> lot of churn for no gain at all.
>>>
>>> In other words even with the problems solved I think it would be a
>>> really bad idea to allow chaining of dma_fence_array objects.
>>
>> Yes, that was really the question, Is it worth pursuing this? I'm not
>> really suggesting we should allow this as an intentional feature. I'm
>> worried, however, that if we allow these containers to start floating
>> around cross-driver (or even internally) disguised as ordinary
>> dma_fences, they would require a lot of driver special casing, or
>> else completely unexpeced WARN_ON()s and lockdep splats would start
>> to turn up, scaring people off from using them. And that would be a
>> breeding ground for hairy driver-private constructs.
>
> Well the question is why we would want to do it?
>
> If it's to avoid inter driver lock dependencies by avoiding to call
> the callback with the spinlock held, then yes please. We had tons of
> problems with that, resulting in irq_work and work_item delegation all
> over the place.
Yes, that sounds like something desirable, but in these containers,
what's causing the lock dependencies is the enable_signaling() callback
that is typically called locked.
>
> If it's to allow nesting of dma_fence_array instances, then it's most
> likely a really bad idea even if we fix all the locking order problems.
Well I think my use-case where I hit a dead end may illustrate what
worries me here:
1) We use a dma-fence-array to coalesce all dependencies for ttm object
migration.
2) We use a dma-fence-chain to order the resulting dm_fence into a
timeline because the TTM resource manager code requires that.
Initially seemingly harmless to me.
But after a sequence evict->alloc->clear, the dma-fence-chain feeds into
the dma-fence-array for the clearing operation. Code still works fine,
and no deep recursion, no warnings. But if I were to add another driver
to the system that instead feeds a dma-fence-array into a
dma-fence-chain, this would give me a lockdep splat.
So then if somebody were to come up with the splendid idea of using a
dma-fence-chain to initially coalesce fences, I'd hit the same problem
or risk illegaly joining two dma-fence-chains together.
To fix this, I would need to look at the incoming fences and iterate
over any dma-fence-array or dma-fence-chain that is fed into the
dma-fence-array to flatten out the input. In fact all dma-fence-array
users would need to do that, and even dma-fence-chain users watching out
for not joining chains together or accidently add an array that perhaps
came as a disguised dma-fence from antother driver.
So the purpose to me would be to allow these containers as input to
eachother without a lot of in-driver special-casing, be it by breaking
recursion on built-in flattening to avoid
a) Hitting issues in the future or with existing interoperating drivers.
b) Avoid driver-private containers that also might break the
interoperability. (For example the i915 currently driver-private
dma_fence_work avoid all these problems, but we're attempting to address
issues in common code rather than re-inventing stuff internally).
/Thomas
>
> Christian.
>
>>
>> /Thomas
>>
>>
>>>
>>> Christian.
>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> /Thomas
>>>>
>>>>
next prev parent reply other threads:[~2021-12-01 11:04 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-30 12:19 [Intel-gfx] [RFC PATCH 0/2] Attempt to avoid dma-fence-[chain|array] lockdep splats Thomas Hellström
2021-11-30 12:19 ` [Intel-gfx] [RFC PATCH 1/2] dma-fence: Avoid establishing a locking order between fence classes Thomas Hellström
2021-11-30 12:25 ` Maarten Lankhorst
2021-11-30 12:31 ` Thomas Hellström
2021-11-30 12:42 ` Christian König
2021-11-30 12:56 ` Thomas Hellström
2021-11-30 13:26 ` Christian König
2021-11-30 14:35 ` Thomas Hellström
2021-11-30 15:02 ` Christian König
2021-11-30 18:12 ` Thomas Hellström
2021-11-30 19:27 ` Thomas Hellström
2021-12-01 7:05 ` Christian König
2021-12-01 8:23 ` [Intel-gfx] [Linaro-mm-sig] " Thomas Hellström (Intel)
2021-12-01 8:36 ` Christian König
2021-12-01 10:15 ` Thomas Hellström (Intel)
2021-12-01 10:32 ` Christian König
2021-12-01 11:04 ` Thomas Hellström (Intel) [this message]
2021-12-01 11:25 ` Christian König
2021-12-01 12:16 ` Thomas Hellström (Intel)
2021-12-03 13:08 ` Christian König
2021-12-03 14:18 ` Thomas Hellström
2021-12-03 14:26 ` Christian König
2021-12-03 14:50 ` Thomas Hellström
2021-12-03 15:00 ` Christian König
2021-12-03 15:13 ` Thomas Hellström (Intel)
2021-12-07 18:08 ` Daniel Vetter
2021-12-07 20:46 ` Thomas Hellström
2021-12-20 9:37 ` Daniel Vetter
2021-11-30 12:32 ` [Intel-gfx] " Thomas Hellström
2021-11-30 12:19 ` [Intel-gfx] [RFC PATCH 2/2] dma-fence: Avoid excessive recursive fence locking from enable_signaling() callbacks Thomas Hellström
2021-11-30 12:36 ` [Intel-gfx] [RFC PATCH 0/2] Attempt to avoid dma-fence-[chain|array] lockdep splats Christian König
2021-11-30 13:05 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
2021-11-30 13:48 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-11-30 17:47 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=94435e0e-01db-5ae4-e424-64f73a09199f@shipmail.org \
--to=thomas_os@shipmail.org \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=matthew.auld@intel.com \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox