Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: John Harrison <john.c.harrison@intel.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH v5] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset
Date: Mon, 28 Aug 2023 16:01:38 -0700	[thread overview]
Message-ID: <2bd0fb41-58d5-9862-143e-34e31f6f791f@intel.com> (raw)
In-Reply-To: <e31b1f49-88cd-d6e4-abbe-51f27712ff83@intel.com>

On 8/23/2023 10:37, John Harrison wrote:
> On 8/23/2023 09:00, Daniel Vetter wrote:
>> On Tue, Aug 22, 2023 at 11:53:24AM -0700, John Harrison wrote:
>>> On 8/11/2023 11:20, Zhanjun Dong wrote:
>>>> This attempts to avoid circular locking dependency between flush 
>>>> delayed
>>>> work and intel_gt_reset.
>>>> When intel_gt_reset was called, task will hold a lock.
>>>> To cacel delayed work here, the _sync version will also acquire a 
>>>> lock,
>>>> which might trigger the possible cirular locking dependency warning.
>>>> When intel_gt_reset called, reset_in_progress flag will be set, add 
>>>> code
>>>> to check the flag, call async verion if reset is in progress.
>>>>
>>>> Signed-off-by: Zhanjun Dong<zhanjun.dong@intel.com>
>>>> Cc: John Harrison<John.C.Harrison@Intel.com>
>>>> Cc: Andi Shyti<andi.shyti@linux.intel.com>
>>>> Cc: Daniel Vetter<daniel@ffwll.ch>
>>>> ---
>>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 ++++++++++-
>>>>    1 file changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> index a0e3ef1c65d2..600388c849f7 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> @@ -1359,7 +1359,16 @@ static void 
>>>> guc_enable_busyness_worker(struct intel_guc *guc)
>>>>    static void guc_cancel_busyness_worker(struct intel_guc *guc)
>>>>    {
>>>> -    cancel_delayed_work_sync(&guc->timestamp.work);
>>>> +    /*
>>>> +     * When intel_gt_reset was called, task will hold a lock.
>>>> +     * To cacel delayed work here, the _sync version will also 
>>>> acquire a lock, which might
>>>> +     * trigger the possible cirular locking dependency warning.
>>>> +     * Check the reset_in_progress flag, call async verion if 
>>>> reset is in progress.
>>>> +     */
>>> This needs to explain in much more detail what is going on and why 
>>> it is not
>>> a problem. E.g.:
>>>
>>>     The busyness worker needs to be cancelled. In general that means
>>>     using the synchronous cancel version to ensure that an in-progress
>>>     worker will not keep executing beyond whatever is happening that
>>>     needs the cancel. E.g. suspend, driver unload, etc. However, in the
>>>     case of a reset, the synchronous version is not required and can
>>>     trigger a false deadlock detection warning.
>>>
>>>     The business worker takes the reset mutex to protect against resets
>>>     interfering with it. However, it does a trylock and bails out if 
>>> the
>>>     reset lock is already acquired. Thus there is no actual deadlock or
>>>     other concern with the worker running concurrently with a reset. So
>>>     an asynchronous cancel is safe in the case of a reset rather than a
>>>     driver unload or suspend type operation. On the other hand, if the
>>>     cancel_sync version is used when a reset is in progress then the
>>>     mutex deadlock detection sees the mutex being acquired through
>>>     multiple paths and complains.
>>>
>>>     So just don't bother. That keeps the detection code happy and is
>>>     safe because of the trylock code described above.
>> So why do we even need to cancel anything if it doesn't do anything 
>> while
>> the reset is in progress?
> It still needs to be cancelled. The worker only aborts if it is 
> actively executing concurrently with the reset. It might not start to 
> execute until after the reset has completed. And there is presumably a 
> reason why the cancel is being called, a reason not necessarily 
> related to resets at all. Leaving the worker to run arbitrarily after 
> the driver is expecting it to be stopped will lead to much worse 
> things than a fake lockdep splat, e.g. a use after free pointer deref.
>
> John.
@Daniel Vetter - ping? Is this explanation sufficient? Are you okay with 
this change now?

John.

>
>>
>> Just remove the cancel from the reset path as uneeded instead, and 
>> explain
>> why that's ok? Because that's defacto what the cancel_work with a
>> potential deadlock scenario for cancel_work_sync does, you either don't
>> need it at all, or the replacement creates a bug.
>> -Daniel
>>
>>>
>>> John.
>>>
>>>
>>>> +    if (guc_to_gt(guc)->uc.reset_in_progress)
>>>> +        cancel_delayed_work(&guc->timestamp.work);
>>>> +    else
>>>> + cancel_delayed_work_sync(&guc->timestamp.work);
>>>>    }
>>>>    static void __reset_guc_busyness_stats(struct intel_guc *guc)
>


  reply	other threads:[~2023-08-28 23:01 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-11 18:20 [Intel-gfx] [PATCH v5] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset Zhanjun Dong
2023-08-11 19:15 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Avoid circular locking dependency when flush delayed work on gt reset (rev5) Patchwork
2023-08-11 19:15 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2023-08-11 19:34 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2023-08-21 14:09 ` [Intel-gfx] [PATCH v5] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset Andi Shyti
2023-08-22 13:50 ` Daniel Vetter
2023-08-22 14:14   ` Dong, Zhanjun
2023-08-22 14:28     ` Daniel Vetter
2023-08-22 18:53 ` John Harrison
2023-08-23 16:00   ` Daniel Vetter
2023-08-23 17:37     ` John Harrison
2023-08-28 23:01       ` John Harrison [this message]
2023-09-06  6:50         ` Daniel Vetter
2023-09-06 18:40           ` John Harrison
2023-08-31 14:00       ` Andi Shyti
2023-08-31 22:27         ` John Harrison
2023-09-06  9:17           ` Andi Shyti
2023-09-06 10:04             ` Jani Nikula
2023-09-06 11:02               ` Daniel Vetter
2023-09-06 18:49             ` John Harrison
2023-08-29 10:11   ` Andi Shyti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2bd0fb41-58d5-9862-143e-34e31f6f791f@intel.com \
    --to=john.c.harrison@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox