From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
Intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
Jason Ekstrand <jason.ekstrand@intel.com>
Subject: Re: [Intel-gfx] [PATCH] drm/i915: Stop propagating fence errors by default
Date: Mon, 17 May 2021 16:33:45 +0100 [thread overview]
Message-ID: <596bdfdd-df65-2166-4839-86e6c3308772@linux.intel.com> (raw)
In-Reply-To: <YKKH1rRy2HN4Gnr8@phenom.ffwll.local>
On 17/05/2021 16:12, Daniel Vetter wrote:
> On Tue, May 11, 2021 at 10:05:27AM +0100, Tvrtko Ursulin wrote:
>>
>> On 10/05/2021 16:55, Daniel Vetter wrote:
>>> On Fri, May 07, 2021 at 09:35:21AM +0100, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> This is an alternative proposed fix for the below references bug report
>>>> where dma fence error propagation is causing undesirable change in
>>>> behaviour post GPU hang/reset.
>>>>
>>>> Approach in this patch is to simply stop propagating all dma fence errors
>>>> by default since that seems to be the upstream ask.
>>>>
>>>> To handle the case where i915 needs error propagation for security, I add
>>>> a new dma fence flag DMA_FENCE_FLAG_PROPAGATE_ERROR and make use of it in
>>>> the command parsing chain only.
>>>>
>>>> It sounds a plausible argument that fence propagation could be useful in
>>>> which case a core flag to enable opt-in should be universally useful.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
>>>> Reported-by: Miroslav Bendik
>>>> References: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences")
>>>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
>>>> Cc: Jason Ekstrand <jason.ekstrand@intel.com>
>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>> ---
>>>> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 ++
>>>> drivers/gpu/drm/i915/i915_sw_fence.c | 8 ++++----
>>>> drivers/gpu/drm/i915/i915_sw_fence.h | 8 ++++++++
>>>> include/linux/dma-fence.h | 1 +
>>>
>>> I still don't like this, least because we still introduce the concept of
>>> error propagation to dma-fence (but hey only in i915 code, which is
>>> exactly the kind of not-really-upstream approach we got a major chiding
>>> for).
>>>
>>> The only thing this does is make it explicitly opt-in instead opt-out,
>>> like the first fix. The right approach is imo still to just throw it out,
>>> and instead make the one error propagation we really need very, very
>>> explicit. Instead of hiding it behind lots of magic.
>>>
>>> The one error propagation we need is when the cmd parser work fails, it
>>> must cancel it's corresponding request to make sure the batchbuffer
>>> doesn't run. This should require about 2 lines in total:
>>>
>>> - one line to store the request so that the cmd parser work can access it.
>>> No refcounting needed, because the the request cannot even start (much
>>> less get freed) before the cmd parser has singalled its fence
>>>
>>> - one line to kill the request if the parsing fails. Maybe 2 if you
>>> include the if condition. I have no idea how that's done since I'm
>>> honestly lost how the i915 scheduler decides whether to run a batch or
>>> not. I'm guessing we have a version of this for the ringbuffer and the
>>> execlist backend (if not maybe gen7 cmdparser is broken?)
>>>
>>> I don't see any need for magic behind-the-scenes propagation of such a
>>> security critical error. Especially when that error propagation thing
>>> caused security bugs of its own, is an i915-only feature, and not
>>> motivated by any userspace/uapi requirements at all.
>>
>> I took this approach because to me propagating errors sounds more logical
>> than ignoring them and I was arguing in the commit message that the
>> infrastructure to enable that could be put in place as opt-in.
>>
>> I also do not see a lot of magic in this patch. Only thing, potentially the
>> logic should be inverted so that the waiter marks itself as interested in
>> receiving errors. That would probably make even more sense as a core
>> concept.
>>
>> Has there been a wider discussion on this topic in the past? I am curious to
>> know, even if propagation currently is i915 only, could other drivers be
>> interested.
>
> There hasn't been. i915-gem team decided "this is a cool concept", which
> resulted in a security bug. Now we're a few months in arguing whether a
> cool-looking concept that leads to a security bug is maybe a good idea,
> and whether we should sneak it in as a core concept to dma-buf.h without
> any wider discussion on the concept.
>
>> Note that it adds almost nothing to the dma-buf common code about a single
>> flag, and at some point (currently missing) documentation on the very flag.
>
> This is really not how upstream collaboration works, and it needs to stop.
>
> If you want this, start another thread arguing why this is a good idea,
> fully decoupled from the security fix here.
When I asked you whether you know there were past discussions on this
topic, clearly the point of that was to figure out whether a new
discussion needs to be started, or I need to go and read an existing one
to get up to speed.
I don't know how you interpreted that as an attempt to sneak anything
in. And I don't know how I could have reliably figured out the answer to
that question without asking. So colour me confused.
To clarify on the security issue part - are you talking about
https://gitlab.freedesktop.org/drm/intel/-/issues/3080, or the other
security issue, the one which would be caused by simply reverting the
error propagation in i915?
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2021-05-17 15:34 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-07 8:35 [Intel-gfx] [PATCH] drm/i915: Stop propagating fence errors by default Tvrtko Ursulin
2021-05-07 9:29 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2021-05-07 9:58 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-05-07 11:38 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2021-05-10 1:36 ` [Intel-gfx] [PATCH] " Jason Ekstrand
2021-05-10 15:55 ` Daniel Vetter
2021-05-11 9:05 ` Tvrtko Ursulin
2021-05-17 15:12 ` Daniel Vetter
2021-05-17 15:33 ` Tvrtko Ursulin [this message]
2021-05-17 15:51 ` Daniel Vetter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=596bdfdd-df65-2166-4839-86e6c3308772@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=Intel-gfx@lists.freedesktop.org \
--cc=daniel.vetter@ffwll.ch \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=jason.ekstrand@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox