From: Andrzej Hajda <andrzej.hajda@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
Andi Shyti <andi.shyti@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org,
Chris Wilson <chris.p.wilson@intel.com>,
Nirmoy Das <nirmoy.das@intel.com>
Subject: Re: [Intel-gfx] [PATCH v2] drm/i915/gt: update request engine before removing virtual GuC engine
Date: Wed, 12 Jul 2023 14:18:35 +0200 [thread overview]
Message-ID: <daeb0906-1b39-ebda-618f-dbce88f751bc@intel.com> (raw)
In-Reply-To: <6f981dd3-715a-6b7e-6c5d-d51610cddc88@linux.intel.com>
On 11.07.2023 17:27, Tvrtko Ursulin wrote:
>
> On 11/07/2023 14:58, Andrzej Hajda wrote:
>> On 11.07.2023 13:34, Andi Shyti wrote:
>>> Hi Andrzej,
>>>
>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 11
>>>> +++++++++++
>>>> 1 file changed, 11 insertions(+)
>>>>
>>>> diff --git
>>>> a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> index a0e3ef1c65d246..2c877ea5eda6f0 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> @@ -3461,6 +3461,8 @@ static void guc_prio_fini(struct
>>>> i915_request *rq, struct intel_context *ce)
>>>> static void remove_from_context(struct i915_request *rq)
>>>> {
>>>> struct intel_context *ce =
>>>> request_to_scheduling_context(rq);
>>>> + struct intel_engine_cs *engine;
>>>> + intel_engine_mask_t tmp;
>>>>
>>>> GEM_BUG_ON(intel_context_is_child(ce));
>>>>
>>>> @@ -3478,6 +3480,15 @@ static void
>>>> remove_from_context(struct i915_request *rq)
>>>>
>>>> atomic_dec(&ce->guc_id.ref);
>>>> i915_request_notify_execute_cb_imm(rq);
>>>> +
>>>> + /*
>>>> + * GuC virtual engine can disappear after this
>>>> call, so let's assign
>>>> + * something valid, as driver expects this to be
>>>> always valid pointer.
>>>> + */
>>>> + for_each_engine_masked(engine, rq->engine->gt,
>>>> rq->execution_mask, tmp) {
>>>> + rq->engine = engine;
>>>>
>>>> yes... here the context might lose the virtual engine... I wonder
>>>> whether this is the rigth solution, though. Maybe we should set
>>>> rq->engine = NULL; and check for NULL? Don't know.
>>>>
>>>>
>>>> Setting NULL causes occasional null page de-reference in
>>>>
>>>> i915_request_wait_timeout:
>>>>
>>>> mutex_release(&rq->engine->gt->reset.mutex.dep_map, _THIS_IP_)
>>>>
>>>> rq->engine after removing rq from context is (IMHO) used as a set of
>>>> aliases
>>>> for gt and i915 (despite rq itself contains the alias to i915).
>>> without investigating further, but maybe that code is not even
>>> supposed to be executed, at this point, if the request's assigned
>>> virtual engine is removed.
>>
>> Real tests show it is executed and the function
>> i915_request_wait_timeout is quite generic
>> I guess it is quite typical use-case, the only question is about
>> timings - what happens earlier -
>> finalization of i915_request_wait_timeout or context removal.
>>
>> The other point rq->engine is accessed after context removal is
>> i915_fence_release -
>> there is long comment there regarding virtual context and reuse
>> retired rq.
>> Anyway calling there "intel_engine_is_virtual(rq->engine)" is risky
>> without this patch and KASAN complains clearly about it:
>> http://gfx-ci.igk.intel.com/tree/drm-tip/kasan.html?testfilter=gem_exec_balancer
>
> Looks like a bug introduced in bcb9aa45d5a0 ("Revert "drm/i915: Hold
> reference to intel_context over life of i915_request""), which was a
> partial revert of 1e98d8c52ed5 ("drm/i915: Hold reference to
> intel_context over life of i915_request").
>
> Ie. if 1e98d8c52ed5 recognised the problem with disappearing rq->engine,
> then I am confused how bcb9aa45d5a0 left the rq->engine dereference in
> there after removing the extra reference.
>
> Could it be that the intel_engine_is_virtual check simply needs to be
> removed from i915_fence_release, restoring things to how they were
> before 1e98d8c52ed5? Could you try it out?
I have already tried something similar [1] and KASAN bugs disappeared,
or more precisely gem_exec_balance tests passed. But I have been warned
by Nirmoy guc virtual engines can be created for only one real engine
(ie. is_power_of_2(rq->execution_mask) is true but rq->engine points to
virtual engine).
[1]: https://patchwork.freedesktop.org/series/118879/
Regards
Andrzej
>
> Regards,
>
> Tvrtko
next prev parent reply other threads:[~2023-07-12 12:18 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-05 16:08 [Intel-gfx] [PATCH] drm/i915/gt: update request engine before removing virtual GuC engine Andrzej Hajda
2023-07-05 17:34 ` [Intel-gfx] ✗ Fi.CI.DOCS: warning for " Patchwork
2023-07-05 17:50 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-07-05 19:11 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2023-07-06 15:16 ` [Intel-gfx] [PATCH v2] " Andrzej Hajda
2023-07-07 8:05 ` Nirmoy Das
2023-07-11 10:18 ` Andi Shyti
2023-07-11 11:12 ` Andrzej Hajda
2023-07-11 11:34 ` Andi Shyti
2023-07-11 13:58 ` Andrzej Hajda
2023-07-11 15:27 ` Tvrtko Ursulin
2023-07-12 12:18 ` Andrzej Hajda [this message]
2023-07-12 12:35 ` Tvrtko Ursulin
2023-07-12 16:27 ` Andrzej Hajda
2023-07-12 18:54 ` John Harrison
2023-07-13 7:39 ` Tvrtko Ursulin
2023-07-13 8:56 ` Tvrtko Ursulin
2023-07-13 11:59 ` Andrzej Hajda
2023-07-13 12:22 ` Tvrtko Ursulin
2023-07-13 11:09 ` Andrzej Hajda
2023-07-13 12:11 ` Tvrtko Ursulin
2023-07-17 18:03 ` John Harrison
2023-07-18 15:48 ` Tvrtko Ursulin
2023-07-19 10:41 ` Andrzej Hajda
2023-07-19 12:43 ` Tvrtko Ursulin
2023-07-24 19:40 ` John Harrison
2023-07-06 20:01 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gt: update request engine before removing virtual GuC engine (rev2) Patchwork
2023-07-06 20:13 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-07-07 2:52 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2023-07-10 12:51 ` Andrzej Hajda
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=daeb0906-1b39-ebda-618f-dbce88f751bc@intel.com \
--to=andrzej.hajda@intel.com \
--cc=andi.shyti@linux.intel.com \
--cc=chris.p.wilson@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=nirmoy.das@intel.com \
--cc=tvrtko.ursulin@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.