From: Francisco Jerez <currojerez@riseup.net>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Cc: stable@vger.kernel.org
Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/gt: Shrink the RPS evalution intervals
Date: Tue, 14 Apr 2020 15:27:33 -0700 [thread overview]
Message-ID: <87ftd5zoru.fsf@riseup.net> (raw)
In-Reply-To: <158690010150.24667.4991755951851899304@build.alporthouse.com>
[-- Attachment #1.1.1: Type: text/plain, Size: 5605 bytes --]
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Quoting Francisco Jerez (2020-04-14 20:39:48)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>>
>> > Quoting Chris Wilson (2020-04-14 17:14:23)
>> >> Try to make RPS dramatically more responsive by shrinking the evaluation
>> >> intervales by a factor of 100! The issue is as we now park the GPU
>> >> rapidly upon idling, a short or bursty workload such as the composited
>> >> desktop never sustains enough work to fill and complete an evaluation
>> >> window. As such, the frequency we program remains stuck. This was first
>> >> reported as once boosted, we never relinquished the boost [see commit
>> >> 21abf0bf168d ("drm/i915/gt: Treat idling as a RPS downclock event")] but
>> >> it equally applies in the order direction for bursty workloads that
>> >> *need* low latency, like desktop animations.
>> >>
>> >> What we could try is preserve the incomplete EI history across idling,
>> >> it is not clear whether that would be effective, nor whether the
>> >> presumption of continuous workloads is accurate. A clearer path seems to
>> >> treat it as symptomatic that we fail to handle bursty workload with the
>> >> current EI, and seek to address that by shrinking the EI so the
>> >> evaluations are run much more often.
>> >>
>> >> This will likely entail more frequent interrupts, and by the time we
>> >> process the interrupt in the bottom half [from inside a worker], the
>> >> workload on the GPU has changed. To address the changeable nature, in
>> >> the previous patch we compared the previous complete EI with the
>> >> interrupt request and only up/down clock if both agree. The impact of
>> >> asking for, and presumably, receiving more interrupts is still to be
>> >> determined and mitigations sought. The first idea is to differentiate
>> >> between up/down responsivity and make upclocking more responsive than
>> >> downlocking. This should both help thwart jitter on bursty workloads by
>> >> making it easier to increase than it is to decrease frequencies, and
>> >> reduce the number of interrupts we would need to process.
>> >
>> > Another worry I'd like to raise, is that by reducing the EI we risk
>> > unstable evaluations. I'm not sure how accurate the HW is, and I worry
>> > about borderline workloads (if that is possible) but mainly the worry is
>> > how the HW is sampling.
>> >
>> > The other unmentioned unknown is the latency in reprogramming the
>> > frequency. At what point does it start to become a significant factor?
>> > I'm presuming the RPS evaluation itself is free, until it has to talk
>> > across the chip to send an interrupt.
>> > -Chris
>>
>> At least on ICL the problem which this patch and 21abf0bf168d were
>> working around seems to have to do with RPS interrupt delivery being
>> inadvertently blocked for extended periods of time. Looking at the GPU
>> utilization and RPS events on a graph I could see the GPU being stuck at
>> low frequency without any RPS interrupts firing, for a time interval
>> orders of magnitude greater than the EI we're theoretically programming
>> today. IOW it seems like the real problem isn't that our EIs are too
>> long, but that we're missing a bunch of them.
>
> Just stuck a pr_err() into gen11_handle_rps_events(), and momentarily
> before we were throttled (and so capped at 100% load), interrupts were
> being delivered:
>
> [ 887.521727] gen11_rps_irq_handler: { iir:20, events:20 }
> [ 887.538039] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.538253] gen11_rps_irq_handler: { iir:20, events:20 }
> [ 887.538555] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.554731] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.554857] gen11_rps_irq_handler: { iir:20, events:20 }
> [ 887.555604] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.571373] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.571496] gen11_rps_irq_handler: { iir:20, events:20 }
> [ 887.571646] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.588199] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.588380] gen11_rps_irq_handler: { iir:20, events:20 }
> [ 887.588692] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.604718] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.604937] gen11_rps_irq_handler: { iir:20, events:20 }
> [ 887.621591] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.621755] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.637988] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.638166] gen11_rps_irq_handler: { iir:20, events:20 }
> [ 887.638803] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.654812] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.655029] gen11_rps_irq_handler: { iir:20, events:20 }
> [ 887.671423] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.671649] gen11_rps_irq_handler: { iir:20, events:20 }
>
> That looks within expectations for the short EI settings. So many
> interrupts is a drag, and I would be tempted to remove the process bottom
> half.
>
> Oh well, I should check how many of those are translated into frequency
> updates. I just wanted to first check if in the first try I stumbled
> into the same loss of interrupts issue.
The interrupt loss seems to be non-deterministic, it's not like we lose
100% of them, since there is always a chance that the GPU is awake
during the PMINTRMSK write. It's easily noticeable anyway with most
GPU-bound workloads on ICL, particularly with the current long EIs.
> -Chris
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
WARNING: multiple messages have this Message-ID (diff)
From: Francisco Jerez <currojerez@riseup.net>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Cc: stable@vger.kernel.org
Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/gt: Shrink the RPS evalution intervals
Date: Tue, 14 Apr 2020 15:27:33 -0700 [thread overview]
Message-ID: <87ftd5zoru.fsf@riseup.net> (raw)
In-Reply-To: <158690010150.24667.4991755951851899304@build.alporthouse.com>
[-- Attachment #1.1: Type: text/plain, Size: 5605 bytes --]
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Quoting Francisco Jerez (2020-04-14 20:39:48)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>>
>> > Quoting Chris Wilson (2020-04-14 17:14:23)
>> >> Try to make RPS dramatically more responsive by shrinking the evaluation
>> >> intervales by a factor of 100! The issue is as we now park the GPU
>> >> rapidly upon idling, a short or bursty workload such as the composited
>> >> desktop never sustains enough work to fill and complete an evaluation
>> >> window. As such, the frequency we program remains stuck. This was first
>> >> reported as once boosted, we never relinquished the boost [see commit
>> >> 21abf0bf168d ("drm/i915/gt: Treat idling as a RPS downclock event")] but
>> >> it equally applies in the order direction for bursty workloads that
>> >> *need* low latency, like desktop animations.
>> >>
>> >> What we could try is preserve the incomplete EI history across idling,
>> >> it is not clear whether that would be effective, nor whether the
>> >> presumption of continuous workloads is accurate. A clearer path seems to
>> >> treat it as symptomatic that we fail to handle bursty workload with the
>> >> current EI, and seek to address that by shrinking the EI so the
>> >> evaluations are run much more often.
>> >>
>> >> This will likely entail more frequent interrupts, and by the time we
>> >> process the interrupt in the bottom half [from inside a worker], the
>> >> workload on the GPU has changed. To address the changeable nature, in
>> >> the previous patch we compared the previous complete EI with the
>> >> interrupt request and only up/down clock if both agree. The impact of
>> >> asking for, and presumably, receiving more interrupts is still to be
>> >> determined and mitigations sought. The first idea is to differentiate
>> >> between up/down responsivity and make upclocking more responsive than
>> >> downlocking. This should both help thwart jitter on bursty workloads by
>> >> making it easier to increase than it is to decrease frequencies, and
>> >> reduce the number of interrupts we would need to process.
>> >
>> > Another worry I'd like to raise, is that by reducing the EI we risk
>> > unstable evaluations. I'm not sure how accurate the HW is, and I worry
>> > about borderline workloads (if that is possible) but mainly the worry is
>> > how the HW is sampling.
>> >
>> > The other unmentioned unknown is the latency in reprogramming the
>> > frequency. At what point does it start to become a significant factor?
>> > I'm presuming the RPS evaluation itself is free, until it has to talk
>> > across the chip to send an interrupt.
>> > -Chris
>>
>> At least on ICL the problem which this patch and 21abf0bf168d were
>> working around seems to have to do with RPS interrupt delivery being
>> inadvertently blocked for extended periods of time. Looking at the GPU
>> utilization and RPS events on a graph I could see the GPU being stuck at
>> low frequency without any RPS interrupts firing, for a time interval
>> orders of magnitude greater than the EI we're theoretically programming
>> today. IOW it seems like the real problem isn't that our EIs are too
>> long, but that we're missing a bunch of them.
>
> Just stuck a pr_err() into gen11_handle_rps_events(), and momentarily
> before we were throttled (and so capped at 100% load), interrupts were
> being delivered:
>
> [ 887.521727] gen11_rps_irq_handler: { iir:20, events:20 }
> [ 887.538039] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.538253] gen11_rps_irq_handler: { iir:20, events:20 }
> [ 887.538555] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.554731] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.554857] gen11_rps_irq_handler: { iir:20, events:20 }
> [ 887.555604] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.571373] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.571496] gen11_rps_irq_handler: { iir:20, events:20 }
> [ 887.571646] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.588199] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.588380] gen11_rps_irq_handler: { iir:20, events:20 }
> [ 887.588692] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.604718] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.604937] gen11_rps_irq_handler: { iir:20, events:20 }
> [ 887.621591] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.621755] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.637988] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.638166] gen11_rps_irq_handler: { iir:20, events:20 }
> [ 887.638803] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.654812] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.655029] gen11_rps_irq_handler: { iir:20, events:20 }
> [ 887.671423] gen11_rps_irq_handler: { iir:10, events:10 }
> [ 887.671649] gen11_rps_irq_handler: { iir:20, events:20 }
>
> That looks within expectations for the short EI settings. So many
> interrupts is a drag, and I would be tempted to remove the process bottom
> half.
>
> Oh well, I should check how many of those are translated into frequency
> updates. I just wanted to first check if in the first try I stumbled
> into the same loss of interrupts issue.
The interrupt loss seems to be non-deterministic, it's not like we lose
100% of them, since there is always a chance that the GPU is awake
during the PMINTRMSK write. It's easily noticeable anyway with most
GPU-bound workloads on ICL, particularly with the current long EIs.
> -Chris
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]
next prev parent reply other threads:[~2020-04-14 22:27 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-14 16:14 [Intel-gfx] [PATCH 1/2] drm/i915/gt: Try to smooth RPS spikes Chris Wilson
2020-04-14 16:14 ` [Intel-gfx] [PATCH 2/2] drm/i915/gt: Shrink the RPS evalution intervals Chris Wilson
2020-04-14 16:14 ` Chris Wilson
2020-04-14 16:35 ` [Intel-gfx] " Chris Wilson
2020-04-14 16:35 ` Chris Wilson
2020-04-14 19:39 ` [Intel-gfx] " Francisco Jerez
2020-04-14 19:39 ` Francisco Jerez
2020-04-14 20:13 ` [Intel-gfx] " Chris Wilson
2020-04-14 20:13 ` Chris Wilson
2020-04-14 21:00 ` Francisco Jerez
2020-04-14 21:00 ` Francisco Jerez
2020-04-14 21:52 ` Chris Wilson
2020-04-14 21:52 ` Chris Wilson
2020-04-14 22:28 ` Francisco Jerez
2020-04-14 22:28 ` Francisco Jerez
2020-04-14 22:38 ` Francisco Jerez
2020-04-14 22:38 ` Francisco Jerez
2020-04-14 21:35 ` Chris Wilson
2020-04-14 21:35 ` Chris Wilson
2020-04-14 22:27 ` Francisco Jerez [this message]
2020-04-14 22:27 ` Francisco Jerez
2020-04-15 7:37 ` Chris Wilson
2020-04-15 7:37 ` Chris Wilson
2020-04-15 11:36 ` Chris Wilson
2020-04-15 11:36 ` Chris Wilson
2020-04-15 11:11 ` Andi Shyti
2020-04-15 11:11 ` Andi Shyti
2020-04-15 0:56 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/2] drm/i915/gt: Try to smooth RPS spikes Patchwork
2020-04-15 1:13 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2020-04-15 1:18 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-04-15 10:56 ` [Intel-gfx] [PATCH 1/2] " Andi Shyti
2020-04-15 11:24 ` Chris Wilson
2020-04-15 11:45 ` Chris Wilson
2020-04-15 15:22 ` [Intel-gfx] ✗ Fi.CI.IGT: failure for series starting with [1/2] " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ftd5zoru.fsf@riseup.net \
--to=currojerez@riseup.net \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.