From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: "Kasireddy, Vivek" <vivek.kasireddy@intel.com>,
"intel-gfx@lists.freedesktop.org"
<intel-gfx@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [PATCH v1] drm/i915/gem: Don't evict unmappable VMAs when pinning with PIN_MAPPABLE
Date: Fri, 18 Mar 2022 09:43:44 +0000 [thread overview]
Message-ID: <5389e455-e42f-993d-0868-d1babba891f9@linux.intel.com> (raw)
In-Reply-To: <c575afeb4f4740c18e12d8ec0eb37e76@intel.com>
On 18/03/2022 07:39, Kasireddy, Vivek wrote:
> Hi Tvrtko,
>
>>
>> On 17/03/2022 07:23, Vivek Kasireddy wrote:
>>> On platforms capable of allowing 8K (7680 x 4320) modes, pinning 2 or
>>> more framebuffers/scanout buffers results in only one that is mappable/
>>> fenceable. Therefore, pageflipping between these 2 FBs where only one
>>> is mappable/fenceable creates latencies large enough to miss alternate
>>> vblanks thereby producing less optimal framerate.
>>>
>>> This mainly happens because when i915_gem_object_pin_to_display_plane()
>>> is called to pin one of the FB objs, the associated vma is identified
>>> as misplaced -- because there is no space for it in the aperture --
>>> and therefore i915_vma_unbind() is called which unbinds and evicts it.
>>> This misplaced vma gets subseqently pinned only when
>>> i915_gem_object_ggtt_pin_ww() is called without PIN_MAPPABLE. This whole
>>> thing results in a latency of ~10ms and happens every other repaint cycle.
>>
>> Just out of curiosity - have you looked at where does this 10ms come
>> from? Like is it simply clearing/writing PTEs so expensive, or there is
>> more to it? Apologies if I asked this before..
> [Kasireddy, Vivek] It appears the ~10ms latency seems to come from the
> execution of gen8_ggtt_clear_range() as seen in the trace log:
> weston-4124 [008] 161210.397563: funcgraph_entry: | __i915_vma_evict() {
> weston-4124 [008] 161210.397564: funcgraph_entry: | ggtt_unbind_vma() {
> weston-4124 [008] 161210.397564: funcgraph_entry: | gen8_ggtt_clear_range() {
> weston-4124 [008] 161210.408012: funcgraph_exit: # 10416.281 us | }
> weston-4124 [008] 161210.408012: funcgraph_exit: # 10448.740 us | }
> weston-4124 [008] 161210.408012: funcgraph_entry: | __vma_put_pages() {
> weston-4124 [008] 161210.408013: funcgraph_entry: 0.083 us | clear_pages();
> weston-4124 [008] 161210.408013: funcgraph_exit: 0.578 us | }
> weston-4124 [008] 161210.408013: funcgraph_exit: # 10449.622 us | }
>
> And, for some reason, I can't get trace-cmd to capture more symbols to
> gain more insights. However, gen8_ggtt_clear_range() seems to just do
> this:
> for (i = 0; i < num_entries; i++)
> gen8_set_pte(>t_base[i], scratch_pte);
> where,
> vma's start = 182190080, length = 132710400, first = 44480, num_entries = 32400
> and we have
> void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
> {
> writeq(pte, addr);
> }
>
> Any idea why executing writeq 32400 times would result in a latency
> of ~10ms?
Uncached writes I guess, due VT-d workaround being required to avoid
display engine faulting due overfetching.
Something like https://patchwork.freedesktop.org/series/97492/ would
have probably resolved this issue as well. I guess time to go and read
that series in detail.
Regards,
Tvrtko
>
>>
>>> Therefore, to fix this issue, we just ensure that the misplaced VMA
>>> does not get evicted when we try to pin it with PIN_MAPPABLE -- by
>>> returning early if the mappable/fenceable flag is not set.
>>>
>>> Testcase:
>>> Running Weston and weston-simple-egl on an Alderlake_S (ADLS) platform
>>> with a 8K@60 mode results in only ~40 FPS (compared to ~59 FPS with
>>> this patch). Since upstream Weston submits a frame ~7ms before the
>>> next vblank, the latencies seen between atomic commit and flip event
>>> are 7, 24 (7 + 16.66), 7, 24..... suggesting that it misses the
>>> vblank every other frame.
>>>
>>> Here is the ftrace snippet that shows the source of the ~10ms latency:
>>> i915_gem_object_pin_to_display_plane() {
>>> 0.102 us | i915_gem_object_set_cache_level();
>>> i915_gem_object_ggtt_pin_ww() {
>>> 0.390 us | i915_vma_instance();
>>> 0.178 us | i915_vma_misplaced();
>>> i915_vma_unbind() {
>>> __i915_active_wait() {
>>> 0.082 us | i915_active_acquire_if_busy();
>>> 0.475 us | }
>>> intel_runtime_pm_get() {
>>> 0.087 us | intel_runtime_pm_acquire();
>>> 0.259 us | }
>>> __i915_active_wait() {
>>> 0.085 us | i915_active_acquire_if_busy();
>>> 0.240 us | }
>>> __i915_vma_evict() {
>>> ggtt_unbind_vma() {
>>> gen8_ggtt_clear_range() {
>>> 10507.255 us | }
>>> 10507.689 us | }
>>> 10508.516 us | }
>>>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
>>> ---
>>> drivers/gpu/drm/i915/i915_gem.c | 8 +++++++-
>>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 9747924cc57b..7307c5de1c58 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -939,8 +939,14 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object
>> *obj,
>>> if (i915_vma_is_pinned(vma) || i915_vma_is_active(vma))
>>> return ERR_PTR(-ENOSPC);
>>>
>>> + /*
>>> + * If this misplaced vma is too big (i.e, at-least
>>> + * half the size of aperture) or just unmappable,
>>> + * we would not be able to pin with PIN_MAPPABLE.
>>> + */
>>
>> I would be tempted to describe the ping-pong issue in the comment. In
>> short would do it, but just because git blame on a line of code tends to
>> fail to lead to the correct commit message after a while.
>>
>> So maybe just say along the lines of "If the misplaced vma is too big
>> ... or hasn't been pinned mappable before, we ignore the misplacement
>> when PIN_NONBLOCK is set in order to avoid ping-pong of double (or more)
>> -buffered frame buffers into the aperture and out on every vblank."
> [Kasireddy, Vivek] Ok, will include more comments in v2.
>
>>
>>> if (flags & PIN_MAPPABLE &&
>>> - vma->fence_size > ggtt->mappable_end / 2)
>>> + (vma->fence_size > ggtt->mappable_end / 2 ||
>>> + !i915_vma_is_map_and_fenceable(vma)))
>>> return ERR_PTR(-ENOSPC);
>>> }
>>>
>>
>> With the expanded comment it looks good to me.
>>
>> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> [Kasireddy, Vivek] Thank you for the review.
>
> Thanks,
> Vivek
>>
>> + Daniel if he wants to double check it.
>>
>> Regards,
>>
>> Tvrtko
next prev parent reply other threads:[~2022-03-18 9:43 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-17 7:23 [Intel-gfx] [PATCH v1] drm/i915/gem: Don't evict unmappable VMAs when pinning with PIN_MAPPABLE Vivek Kasireddy
2022-03-17 8:27 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2022-03-17 10:01 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2022-03-17 11:16 ` [Intel-gfx] [PATCH v1] " Tvrtko Ursulin
2022-03-18 7:39 ` Kasireddy, Vivek
2022-03-18 9:43 ` Tvrtko Ursulin [this message]
2022-03-21 0:54 ` [Intel-gfx] [PATCH v2] drm/i915/gem: Don't evict unmappable VMAs when pinning with PIN_MAPPABLE (v2) Vivek Kasireddy
2022-03-21 12:34 ` Tvrtko Ursulin
2022-03-21 2:00 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/gem: Don't evict unmappable VMAs when pinning with PIN_MAPPABLE (rev2) Patchwork
2022-03-21 3:20 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5389e455-e42f-993d-0868-d1babba891f9@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=vivek.kasireddy@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.