Re: [Intel-gfx] [PATCH] drm/i915/pmu: Fix synchronization of PMU callback with reset

public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/pmu: Fix synchronization of PMU callback with reset
Date: Tue, 23 Nov 2021 09:15:10 +0000	[thread overview]
Message-ID: <20493aa0-c5b3-c8c2-38f2-c989e0ee2cb0@linux.intel.com> (raw)
In-Reply-To: <20211122233933.GA1371@unerlige-ril-10.165.21.208>


On 22/11/2021 23:39, Umesh Nerlige Ramappa wrote:
> On Mon, Nov 22, 2021 at 03:44:29PM +0000, Tvrtko Ursulin wrote:
>>
>> On 11/11/2021 16:48, Umesh Nerlige Ramappa wrote:
>>> On Thu, Nov 11, 2021 at 02:37:43PM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 04/11/2021 22:04, Umesh Nerlige Ramappa wrote:
>>>>> On Thu, Nov 04, 2021 at 05:37:37PM +0000, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 03/11/2021 22:47, Umesh Nerlige Ramappa wrote:
>>>>>>> Since the PMU callback runs in irq context, it synchronizes with gt
>>>>>>> reset using the reset count. We could run into a case where the PMU
>>>>>>> callback could read the reset count before it is updated. This has a
>>>>>>> potential of corrupting the busyness stats.
>>>>>>>
>>>>>>> In addition to the reset count, check if the reset bit is set before
>>>>>>> capturing busyness.
>>>>>>>
>>>>>>> In addition save the previous stats only if you intend to update 
>>>>>>> them.
>>>>>>>
>>>>>>> Signed-off-by: Umesh Nerlige Ramappa 
>>>>>>> <umesh.nerlige.ramappa@intel.com>
>>>>>>> ---
>>>>>>>  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 12 ++++++++----
>>>>>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>>>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>>>> index 5cc49c0b3889..d83ade77ca07 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>>>> @@ -1183,6 +1183,7 @@ static ktime_t guc_engine_busyness(struct 
>>>>>>> intel_engine_cs *engine, ktime_t *now)
>>>>>>>      u64 total, gt_stamp_saved;
>>>>>>>      unsigned long flags;
>>>>>>>      u32 reset_count;
>>>>>>> +    bool in_reset;
>>>>>>>      spin_lock_irqsave(&guc->timestamp.lock, flags);
>>>>>>> @@ -1191,7 +1192,9 @@ static ktime_t guc_engine_busyness(struct 
>>>>>>> intel_engine_cs *engine, ktime_t *now)
>>>>>>>       * engine busyness from GuC, so we just use the driver stored
>>>>>>>       * copy of busyness. Synchronize with gt reset using 
>>>>>>> reset_count.
>>>>>>>       */
>>>>>>> -    reset_count = i915_reset_count(gpu_error);
>>>>>>> +    rcu_read_lock();
>>>>>>> +    in_reset = test_bit(I915_RESET_BACKOFF, &gt->reset.flags);
>>>>>>> +    rcu_read_unlock();
>>>>>>
>>>>>> I don't really understand the point of rcu_read_lock over test_bit 
>>>>>> but I guess you copied it from the trylock loop.
>>>>>
>>>>> Yes, I don't see other parts of code using the lock though. I can 
>>>>> drop it.
>>>>>
>>>>>>
>>>>>>>      *now = ktime_get();
>>>>>>> @@ -1201,9 +1204,10 @@ static ktime_t guc_engine_busyness(struct 
>>>>>>> intel_engine_cs *engine, ktime_t *now)
>>>>>>>       * start_gt_clk is derived from GuC state. To get a consistent
>>>>>>>       * view of activity, we query the GuC state only if gt is 
>>>>>>> awake.
>>>>>>>       */
>>>>>>> -    stats_saved = *stats;
>>>>>>> -    gt_stamp_saved = guc->timestamp.gt_stamp;
>>>>>>> -    if (intel_gt_pm_get_if_awake(gt)) {
>>>>>>> +    if (intel_gt_pm_get_if_awake(gt) && !in_reset) {
>>>>>>
>>>>>> What is the point of looking at the old value of in_reset here? 
>>>>>> Gut feeling says if there is a race this does not fix it.
>>>>>>
>>>>>> I did not figure out from the commit message what does "could read 
>>>>>> the reset count before it is updated" mean?
>>>>>> I thought the point of reading
>>>>>
>>>>>> the reset count twice was that you are sure there was no reset 
>>>>>> while in here, in which case it is safe to update the software 
>>>>>> copy. I don't easily see what test_bit does on top.
>>>>>
>>>>> This is what I see in the reset flow
>>>>> ---------------
>>>>>
>>>>> R1) test_and_set_bit(I915_RESET_BACKOFF, &gt->reset.flags)
>>>>> R2) atomic_inc(&gt->i915->gpu_error.reset_count)
>>>>> R3) reset prepare
>>>>> R4) do the HW reset
>>>>>
>>>>> The reset count is updated only once above and that's before an 
>>>>> actual HW reset happens.
>>>>>
>>>>> PMU callback flow before this patch
>>>>> ---------------
>>>>>
>>>>> P1) read reset count
>>>>> P2) update stats
>>>>> P3) read reset count
>>>>> P4) if reset count changed, use old stats. if not use updated stats.
>>>>>
>>>>> I am concerned that the PMU flow could run after step (R2). Then we 
>>>>> wrongly conclude that the count stayed the same and no HW reset 
>>>>> happened.
>>>
>>> Here is the problematic sequence: Threads R and P.
>>> ------------
>>> R1) test_and_set_bit(I915_RESET_BACKOFF, &gt->reset.flags)
>>> R2) atomic_inc(&gt->i915->gpu_error.reset_count)
>>>     P1) read reset count
>>>     P2) update stats
>>>     P3) read reset count
>>>     P4) if reset count changed, use old stats. if not use updated stats.
>>> R3) reset prepare
>>> R4) do the HW reset
>>>
>>> Do you agree that this is racy? In thread P we don't know in if the 
>>> reset flag was set or not when we captured the reset count in P1?
>>>
>>>>>
>>>>> PMU callback flow with this patch
>>>>> ---------------
>>>>> This would rely on the reset_count only if a reset is not in progress.
>>>>>
>>>>> P0) test_bit for I915_RESET_BACKOFF
>>>>> P1) read reset count if not in reset. if in reset, use old stats
>>>>> P2) update stats
>>>>> P3) read reset count
>>>>> P4) if reset count changed, use old stats. if not use updated stats.
>>>>>
>>>>> Now that I think about it more, I do see one sequence that still 
>>>>> needs fixing though - P0, R1, R2, P1 - P4. For that, I think I need 
>>>>> to re-read the BACKOFF bit after reading the reset_count for the 
>>>>> first time.
>>>>> Modified PMU callback sequence would be:
>>>>> ----------
>>>>>
>>>>> M0) test_bit for I915_RESET_BACKOFF
>>>>> M1) read reset count if not in reset, if in reset, use old stats
>>>>>
>>>>> M1.1) test_bit for I915_RESET_BACKOFF. if set, use old stats. if 
>>>>> not, use reset_count to synchronize
>>>>>
>>>>> M2) update stats
>>>>> M3) read reset count
>>>>> M4) if reset count changed, use old stats. if not use updated stats.
>>>>
>>>> You did not end up implementing this flow? Have you later changed 
>>>> your mind whether it is required or not? Or maybe I am looking at 
>>>> not the latest patch.
>>>>
>>>> Is the below the latest?
>>>>
>>>> """
>>>> v2:
>>>> - The 2 reset counts captured in the PMU callback can end up being the
>>>>  same if they were captured right after the count is incremented in the
>>>>  reset flow. This can lead to a bad busyness state. Ensure that reset
>>>>  is not in progress when the initial reset count is captured.
>>>> """
>>>
>>> Yes, v2 is the latest (maybe CI results re-ordered the patches). 
>>> Instead of sampling the BACKOFF flag before and after the reset count 
>>> (as in the modified sequence), I just sample it after. The order is 
>>> critical - first sample reset count and then the reset flag.
>>>
>>>>
>>>> Is the key now that you rely on ordering of atomic_inc and set_bit 
>>>> in the reset path?
>>>
>>> Yes
>>>
>>>> Frankly I still don't understand why you can get away
>>>
>>>> with using stale in_reset in v2. If you acknowledge it can change 
>>>> between sampling and checking, then what is the point in having it 
>>>> altogether? You still solely rely on reset count in that case, no?
>>>
>>> Correct, but now I know for sure that the first sample of reset_count 
>>> was captured when reset flag was not set (since I am relying on the 
>>> order of sampling).
>>>
>>> About solely using the reset_count, I have listed the problematic 
>>> sequence above to highlight what the issue is.
>>
>> It was this:
>>
>> """
>> R1) test_and_set_bit(I915_RESET_BACKOFF, &gt->reset.flags)
>> R2) atomic_inc(&gt->i915->gpu_error.reset_count)
>>     P1) read reset count
>>     P2) update stats
>>     P3) read reset count
>>     P4) if reset count changed, use old stats. if not use updated stats.
>> R3) reset prepare
>> R4) do the HW reset
>>
>> Do you agree that this is racy? In thread P we don't know in if the 
>> reset flag was set or not when we captured the reset count in P1?
>> """
>>
>> Why it matter if reset flag was set or not? Lets see how things are 
>> after this patch:
>>
>> After this patch it ends like this:
>>
>>     P1) Read and store reset bit
>> R1) test_and_set_bit(I915_RESET_BACKOFF, &gt->reset.flags)
>>     P2) If reset bit was not set:
>>           P2.1) read reset count
>> R2) atomic_inc(&gt->i915->gpu_error.reset_count)
>>           P2.2) update stats
>>           P2.3) read reset count
>>           P2.4) if reset count changed, use old stats. if not use 
>> updated stats.
>> R3) reset prepare
>> R4) do the HW reset
>>
>> So the reset bit got set between P1 and P2. How is that then not the 
>> same as not looking at the reset bit at all?
> 
> But the new sequence in this patch is this:

Oops I was looking at v1. Okay I can't come up with any further races, 
acked.

Regards,

Tvrtko

> 
>      P0) read reset count
>      P1) Read and store reset bit
> R1) test_and_set_bit(I915_RESET_BACKOFF, &gt->reset.flags)
>      P2) If reset bit was not set:
> R2) atomic_inc(&gt->i915->gpu_error.reset_count)
>         P2.2) update stats
>         P2.3) read reset count
>         P2.4) if reset count changed, use old stats. if not use updated 
> stats.
> R3) reset prepare
> R4) do the HW reset
> 
> P2.1 moved to P0 when compared to the sequence you shared above.
> 
> Thanks,
> Umesh
> 
>>
>> Regards,
>>
>> Tvrtko

next prev parent reply	other threads:[~2021-11-23  9:15 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-03 22:47 [Intel-gfx] [PATCH] drm/i915/pmu: Fix synchronization of PMU callback with reset Umesh Nerlige Ramappa
2021-11-03 23:47 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2021-11-04  0:55 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2021-11-04 15:57 ` [Intel-gfx] [PATCH] " Matthew Brost
2021-11-04 17:37 ` Tvrtko Ursulin
2021-11-04 22:04   ` Umesh Nerlige Ramappa
2021-11-11 14:37     ` Tvrtko Ursulin
2021-11-11 16:48       ` Umesh Nerlige Ramappa
2021-11-20  0:25         ` Umesh Nerlige Ramappa
2021-11-22 15:44         ` Tvrtko Ursulin
2021-11-22 23:39           ` Umesh Nerlige Ramappa
2021-11-23  9:15             ` Tvrtko Ursulin [this message]
  -- strict thread matches above, loose matches on Subject: below --
2021-11-08 21:10 Umesh Nerlige Ramappa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20493aa0-c5b3-c8c2-38f2-c989e0ee2cb0@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=umesh.nerlige.ramappa@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox