From: John Harrison <john.c.harrison@intel.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH v5] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset
Date: Mon, 28 Aug 2023 16:01:38 -0700 [thread overview]
Message-ID: <2bd0fb41-58d5-9862-143e-34e31f6f791f@intel.com> (raw)
In-Reply-To: <e31b1f49-88cd-d6e4-abbe-51f27712ff83@intel.com>
On 8/23/2023 10:37, John Harrison wrote:
> On 8/23/2023 09:00, Daniel Vetter wrote:
>> On Tue, Aug 22, 2023 at 11:53:24AM -0700, John Harrison wrote:
>>> On 8/11/2023 11:20, Zhanjun Dong wrote:
>>>> This attempts to avoid circular locking dependency between flush
>>>> delayed
>>>> work and intel_gt_reset.
>>>> When intel_gt_reset was called, task will hold a lock.
>>>> To cacel delayed work here, the _sync version will also acquire a
>>>> lock,
>>>> which might trigger the possible cirular locking dependency warning.
>>>> When intel_gt_reset called, reset_in_progress flag will be set, add
>>>> code
>>>> to check the flag, call async verion if reset is in progress.
>>>>
>>>> Signed-off-by: Zhanjun Dong<zhanjun.dong@intel.com>
>>>> Cc: John Harrison<John.C.Harrison@Intel.com>
>>>> Cc: Andi Shyti<andi.shyti@linux.intel.com>
>>>> Cc: Daniel Vetter<daniel@ffwll.ch>
>>>> ---
>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 ++++++++++-
>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> index a0e3ef1c65d2..600388c849f7 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> @@ -1359,7 +1359,16 @@ static void
>>>> guc_enable_busyness_worker(struct intel_guc *guc)
>>>> static void guc_cancel_busyness_worker(struct intel_guc *guc)
>>>> {
>>>> - cancel_delayed_work_sync(&guc->timestamp.work);
>>>> + /*
>>>> + * When intel_gt_reset was called, task will hold a lock.
>>>> + * To cacel delayed work here, the _sync version will also
>>>> acquire a lock, which might
>>>> + * trigger the possible cirular locking dependency warning.
>>>> + * Check the reset_in_progress flag, call async verion if
>>>> reset is in progress.
>>>> + */
>>> This needs to explain in much more detail what is going on and why
>>> it is not
>>> a problem. E.g.:
>>>
>>> The busyness worker needs to be cancelled. In general that means
>>> using the synchronous cancel version to ensure that an in-progress
>>> worker will not keep executing beyond whatever is happening that
>>> needs the cancel. E.g. suspend, driver unload, etc. However, in the
>>> case of a reset, the synchronous version is not required and can
>>> trigger a false deadlock detection warning.
>>>
>>> The business worker takes the reset mutex to protect against resets
>>> interfering with it. However, it does a trylock and bails out if
>>> the
>>> reset lock is already acquired. Thus there is no actual deadlock or
>>> other concern with the worker running concurrently with a reset. So
>>> an asynchronous cancel is safe in the case of a reset rather than a
>>> driver unload or suspend type operation. On the other hand, if the
>>> cancel_sync version is used when a reset is in progress then the
>>> mutex deadlock detection sees the mutex being acquired through
>>> multiple paths and complains.
>>>
>>> So just don't bother. That keeps the detection code happy and is
>>> safe because of the trylock code described above.
>> So why do we even need to cancel anything if it doesn't do anything
>> while
>> the reset is in progress?
> It still needs to be cancelled. The worker only aborts if it is
> actively executing concurrently with the reset. It might not start to
> execute until after the reset has completed. And there is presumably a
> reason why the cancel is being called, a reason not necessarily
> related to resets at all. Leaving the worker to run arbitrarily after
> the driver is expecting it to be stopped will lead to much worse
> things than a fake lockdep splat, e.g. a use after free pointer deref.
>
> John.
@Daniel Vetter - ping? Is this explanation sufficient? Are you okay with
this change now?
John.
>
>>
>> Just remove the cancel from the reset path as uneeded instead, and
>> explain
>> why that's ok? Because that's defacto what the cancel_work with a
>> potential deadlock scenario for cancel_work_sync does, you either don't
>> need it at all, or the replacement creates a bug.
>> -Daniel
>>
>>>
>>> John.
>>>
>>>
>>>> + if (guc_to_gt(guc)->uc.reset_in_progress)
>>>> + cancel_delayed_work(&guc->timestamp.work);
>>>> + else
>>>> + cancel_delayed_work_sync(&guc->timestamp.work);
>>>> }
>>>> static void __reset_guc_busyness_stats(struct intel_guc *guc)
>
next prev parent reply other threads:[~2023-08-28 23:01 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-11 18:20 [Intel-gfx] [PATCH v5] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset Zhanjun Dong
2023-08-11 19:15 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Avoid circular locking dependency when flush delayed work on gt reset (rev5) Patchwork
2023-08-11 19:15 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2023-08-11 19:34 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2023-08-21 14:09 ` [Intel-gfx] [PATCH v5] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset Andi Shyti
2023-08-22 13:50 ` Daniel Vetter
2023-08-22 14:14 ` Dong, Zhanjun
2023-08-22 14:28 ` Daniel Vetter
2023-08-22 18:53 ` John Harrison
2023-08-23 16:00 ` Daniel Vetter
2023-08-23 17:37 ` John Harrison
2023-08-28 23:01 ` John Harrison [this message]
2023-09-06 6:50 ` Daniel Vetter
2023-09-06 18:40 ` John Harrison
2023-08-31 14:00 ` Andi Shyti
2023-08-31 22:27 ` John Harrison
2023-09-06 9:17 ` Andi Shyti
2023-09-06 10:04 ` Jani Nikula
2023-09-06 11:02 ` Daniel Vetter
2023-09-06 18:49 ` John Harrison
2023-08-29 10:11 ` Andi Shyti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2bd0fb41-58d5-9862-143e-34e31f6f791f@intel.com \
--to=john.c.harrison@intel.com \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox