From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/pmu: Fix synchronization of PMU callback with reset
Date: Tue, 23 Nov 2021 09:15:10 +0000 [thread overview]
Message-ID: <20493aa0-c5b3-c8c2-38f2-c989e0ee2cb0@linux.intel.com> (raw)
In-Reply-To: <20211122233933.GA1371@unerlige-ril-10.165.21.208>
On 22/11/2021 23:39, Umesh Nerlige Ramappa wrote:
> On Mon, Nov 22, 2021 at 03:44:29PM +0000, Tvrtko Ursulin wrote:
>>
>> On 11/11/2021 16:48, Umesh Nerlige Ramappa wrote:
>>> On Thu, Nov 11, 2021 at 02:37:43PM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 04/11/2021 22:04, Umesh Nerlige Ramappa wrote:
>>>>> On Thu, Nov 04, 2021 at 05:37:37PM +0000, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 03/11/2021 22:47, Umesh Nerlige Ramappa wrote:
>>>>>>> Since the PMU callback runs in irq context, it synchronizes with gt
>>>>>>> reset using the reset count. We could run into a case where the PMU
>>>>>>> callback could read the reset count before it is updated. This has a
>>>>>>> potential of corrupting the busyness stats.
>>>>>>>
>>>>>>> In addition to the reset count, check if the reset bit is set before
>>>>>>> capturing busyness.
>>>>>>>
>>>>>>> In addition save the previous stats only if you intend to update
>>>>>>> them.
>>>>>>>
>>>>>>> Signed-off-by: Umesh Nerlige Ramappa
>>>>>>> <umesh.nerlige.ramappa@intel.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 12 ++++++++----
>>>>>>> 1 file changed, 8 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>>>> index 5cc49c0b3889..d83ade77ca07 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>>>> @@ -1183,6 +1183,7 @@ static ktime_t guc_engine_busyness(struct
>>>>>>> intel_engine_cs *engine, ktime_t *now)
>>>>>>> u64 total, gt_stamp_saved;
>>>>>>> unsigned long flags;
>>>>>>> u32 reset_count;
>>>>>>> + bool in_reset;
>>>>>>> spin_lock_irqsave(&guc->timestamp.lock, flags);
>>>>>>> @@ -1191,7 +1192,9 @@ static ktime_t guc_engine_busyness(struct
>>>>>>> intel_engine_cs *engine, ktime_t *now)
>>>>>>> * engine busyness from GuC, so we just use the driver stored
>>>>>>> * copy of busyness. Synchronize with gt reset using
>>>>>>> reset_count.
>>>>>>> */
>>>>>>> - reset_count = i915_reset_count(gpu_error);
>>>>>>> + rcu_read_lock();
>>>>>>> + in_reset = test_bit(I915_RESET_BACKOFF, >->reset.flags);
>>>>>>> + rcu_read_unlock();
>>>>>>
>>>>>> I don't really understand the point of rcu_read_lock over test_bit
>>>>>> but I guess you copied it from the trylock loop.
>>>>>
>>>>> Yes, I don't see other parts of code using the lock though. I can
>>>>> drop it.
>>>>>
>>>>>>
>>>>>>> *now = ktime_get();
>>>>>>> @@ -1201,9 +1204,10 @@ static ktime_t guc_engine_busyness(struct
>>>>>>> intel_engine_cs *engine, ktime_t *now)
>>>>>>> * start_gt_clk is derived from GuC state. To get a consistent
>>>>>>> * view of activity, we query the GuC state only if gt is
>>>>>>> awake.
>>>>>>> */
>>>>>>> - stats_saved = *stats;
>>>>>>> - gt_stamp_saved = guc->timestamp.gt_stamp;
>>>>>>> - if (intel_gt_pm_get_if_awake(gt)) {
>>>>>>> + if (intel_gt_pm_get_if_awake(gt) && !in_reset) {
>>>>>>
>>>>>> What is the point of looking at the old value of in_reset here?
>>>>>> Gut feeling says if there is a race this does not fix it.
>>>>>>
>>>>>> I did not figure out from the commit message what does "could read
>>>>>> the reset count before it is updated" mean?
>>>>>> I thought the point of reading
>>>>>
>>>>>> the reset count twice was that you are sure there was no reset
>>>>>> while in here, in which case it is safe to update the software
>>>>>> copy. I don't easily see what test_bit does on top.
>>>>>
>>>>> This is what I see in the reset flow
>>>>> ---------------
>>>>>
>>>>> R1) test_and_set_bit(I915_RESET_BACKOFF, >->reset.flags)
>>>>> R2) atomic_inc(>->i915->gpu_error.reset_count)
>>>>> R3) reset prepare
>>>>> R4) do the HW reset
>>>>>
>>>>> The reset count is updated only once above and that's before an
>>>>> actual HW reset happens.
>>>>>
>>>>> PMU callback flow before this patch
>>>>> ---------------
>>>>>
>>>>> P1) read reset count
>>>>> P2) update stats
>>>>> P3) read reset count
>>>>> P4) if reset count changed, use old stats. if not use updated stats.
>>>>>
>>>>> I am concerned that the PMU flow could run after step (R2). Then we
>>>>> wrongly conclude that the count stayed the same and no HW reset
>>>>> happened.
>>>
>>> Here is the problematic sequence: Threads R and P.
>>> ------------
>>> R1) test_and_set_bit(I915_RESET_BACKOFF, >->reset.flags)
>>> R2) atomic_inc(>->i915->gpu_error.reset_count)
>>> P1) read reset count
>>> P2) update stats
>>> P3) read reset count
>>> P4) if reset count changed, use old stats. if not use updated stats.
>>> R3) reset prepare
>>> R4) do the HW reset
>>>
>>> Do you agree that this is racy? In thread P we don't know in if the
>>> reset flag was set or not when we captured the reset count in P1?
>>>
>>>>>
>>>>> PMU callback flow with this patch
>>>>> ---------------
>>>>> This would rely on the reset_count only if a reset is not in progress.
>>>>>
>>>>> P0) test_bit for I915_RESET_BACKOFF
>>>>> P1) read reset count if not in reset. if in reset, use old stats
>>>>> P2) update stats
>>>>> P3) read reset count
>>>>> P4) if reset count changed, use old stats. if not use updated stats.
>>>>>
>>>>> Now that I think about it more, I do see one sequence that still
>>>>> needs fixing though - P0, R1, R2, P1 - P4. For that, I think I need
>>>>> to re-read the BACKOFF bit after reading the reset_count for the
>>>>> first time.
>>>>> Modified PMU callback sequence would be:
>>>>> ----------
>>>>>
>>>>> M0) test_bit for I915_RESET_BACKOFF
>>>>> M1) read reset count if not in reset, if in reset, use old stats
>>>>>
>>>>> M1.1) test_bit for I915_RESET_BACKOFF. if set, use old stats. if
>>>>> not, use reset_count to synchronize
>>>>>
>>>>> M2) update stats
>>>>> M3) read reset count
>>>>> M4) if reset count changed, use old stats. if not use updated stats.
>>>>
>>>> You did not end up implementing this flow? Have you later changed
>>>> your mind whether it is required or not? Or maybe I am looking at
>>>> not the latest patch.
>>>>
>>>> Is the below the latest?
>>>>
>>>> """
>>>> v2:
>>>> - The 2 reset counts captured in the PMU callback can end up being the
>>>> same if they were captured right after the count is incremented in the
>>>> reset flow. This can lead to a bad busyness state. Ensure that reset
>>>> is not in progress when the initial reset count is captured.
>>>> """
>>>
>>> Yes, v2 is the latest (maybe CI results re-ordered the patches).
>>> Instead of sampling the BACKOFF flag before and after the reset count
>>> (as in the modified sequence), I just sample it after. The order is
>>> critical - first sample reset count and then the reset flag.
>>>
>>>>
>>>> Is the key now that you rely on ordering of atomic_inc and set_bit
>>>> in the reset path?
>>>
>>> Yes
>>>
>>>> Frankly I still don't understand why you can get away
>>>
>>>> with using stale in_reset in v2. If you acknowledge it can change
>>>> between sampling and checking, then what is the point in having it
>>>> altogether? You still solely rely on reset count in that case, no?
>>>
>>> Correct, but now I know for sure that the first sample of reset_count
>>> was captured when reset flag was not set (since I am relying on the
>>> order of sampling).
>>>
>>> About solely using the reset_count, I have listed the problematic
>>> sequence above to highlight what the issue is.
>>
>> It was this:
>>
>> """
>> R1) test_and_set_bit(I915_RESET_BACKOFF, >->reset.flags)
>> R2) atomic_inc(>->i915->gpu_error.reset_count)
>> P1) read reset count
>> P2) update stats
>> P3) read reset count
>> P4) if reset count changed, use old stats. if not use updated stats.
>> R3) reset prepare
>> R4) do the HW reset
>>
>> Do you agree that this is racy? In thread P we don't know in if the
>> reset flag was set or not when we captured the reset count in P1?
>> """
>>
>> Why it matter if reset flag was set or not? Lets see how things are
>> after this patch:
>>
>> After this patch it ends like this:
>>
>> P1) Read and store reset bit
>> R1) test_and_set_bit(I915_RESET_BACKOFF, >->reset.flags)
>> P2) If reset bit was not set:
>> P2.1) read reset count
>> R2) atomic_inc(>->i915->gpu_error.reset_count)
>> P2.2) update stats
>> P2.3) read reset count
>> P2.4) if reset count changed, use old stats. if not use
>> updated stats.
>> R3) reset prepare
>> R4) do the HW reset
>>
>> So the reset bit got set between P1 and P2. How is that then not the
>> same as not looking at the reset bit at all?
>
> But the new sequence in this patch is this:
Oops I was looking at v1. Okay I can't come up with any further races,
acked.
Regards,
Tvrtko
>
> P0) read reset count
> P1) Read and store reset bit
> R1) test_and_set_bit(I915_RESET_BACKOFF, >->reset.flags)
> P2) If reset bit was not set:
> R2) atomic_inc(>->i915->gpu_error.reset_count)
> P2.2) update stats
> P2.3) read reset count
> P2.4) if reset count changed, use old stats. if not use updated
> stats.
> R3) reset prepare
> R4) do the HW reset
>
> P2.1 moved to P0 when compared to the sequence you shared above.
>
> Thanks,
> Umesh
>
>>
>> Regards,
>>
>> Tvrtko
next prev parent reply other threads:[~2021-11-23 9:15 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-03 22:47 [Intel-gfx] [PATCH] drm/i915/pmu: Fix synchronization of PMU callback with reset Umesh Nerlige Ramappa
2021-11-03 23:47 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2021-11-04 0:55 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2021-11-04 15:57 ` [Intel-gfx] [PATCH] " Matthew Brost
2021-11-04 17:37 ` Tvrtko Ursulin
2021-11-04 22:04 ` Umesh Nerlige Ramappa
2021-11-11 14:37 ` Tvrtko Ursulin
2021-11-11 16:48 ` Umesh Nerlige Ramappa
2021-11-20 0:25 ` Umesh Nerlige Ramappa
2021-11-22 15:44 ` Tvrtko Ursulin
2021-11-22 23:39 ` Umesh Nerlige Ramappa
2021-11-23 9:15 ` Tvrtko Ursulin [this message]
-- strict thread matches above, loose matches on Subject: below --
2021-11-08 21:10 Umesh Nerlige Ramappa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20493aa0-c5b3-c8c2-38f2-c989e0ee2cb0@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=umesh.nerlige.ramappa@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox