Re: [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero

Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: John Harrison <john.c.harrison@intel.com>
To: Alan Previn <alan.previn.teres.alexis@intel.com>,
	<intel-gfx@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero
Date: Thu, 28 Jul 2022 13:19:53 -0700	[thread overview]
Message-ID: <e9ab17b2-8068-cdc8-8b77-15c39b0a80ba@intel.com> (raw)
In-Reply-To: <20220628055130.1117146-3-alan.previn.teres.alexis@intel.com>

On 6/27/2022 22:51, Alan Previn wrote:
> From: Matthew Brost <matthew.brost@intel.com>
>
> Add a delay, configurable via debugs (default 100ms), to disable
debugs -> debugfs

Default is now 34ms?

> scheduling of a context after the pin count goes to zero. Disable
> scheduling is somewhat costly operation so the idea is a delay allows
costly operation as it requires synchronising with the GuC. So the idea

> the resubmit something before doing this operation. This delay is only
the user to resubmit

> done if the context isn't close and less than 3/4 of the guc_ids are in
close -> closed

less than a given threshold (default is 3/4) of the guc_ids

> use.
>
> As temporary WA disable this feature for the selftests. Selftests are
> very timing sensitive and any change in timing can cause failure. A
> follow up patch will fixup the selftests to understand this delay.
>
> Alan Previn: Matt Brost first introduced this series back in Oct 2021.
> However no real world workload with measured performance impact was
> available to prove the intended results. Today, this series is being
> republished in response to a real world workload that benefited greatly
> from it along with measured performance improvement.
>
> Workload description: 36 containers were created on a DG2 device where
> each container was performing a combination of 720p 3d game rendering
> and 30fps video encoding. The workload density was configured in way
> that guaranteed each container to ALWAYS be able to render and
> encode no less than 30fps with a predefined maximum render + encode
> latency time. That means that the totality of all 36 containers and its
> workloads were not saturating the utilized hw engines to its max
> (in order to maintain just enough headrooom to meet the minimum fps and
> latencies of incoming container submissions).
>
> Problem statement: It was observed that the CPU utilization of the CPU
> core that was pinned to i915 soft IRQ work was experiencing severe load.
> Using tracelogs and an instrumentation patch to count specific i915 IRQ
> events, it was confirmed that the majority of the CPU cycles were caused
> by the gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
> majority of the cycles was determined to be processing a specific G2H IRQ
> which was INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. This IRQ is send by
send -> sent

> the GuC in response to the i915 KMD sending the H2G requests
> INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET to the GuC. That request is sent
> when the context is idle to unpin the context from any GuC access. The
> high CPU utilization % symptom was limiting the density scaling.
>
> Root Cause Analysis: Because the incoming execution buffers were spread
> across 36 different containers (each with multiple contexts) but the
> system in totality was NOT saturated to the max, it was assumed that each
> context was constantly idling between submissions. This was causing thrashing
> of unpinning a context from GuC at one moment, followed by repinning it
> due to incoming workload the very next moment. Both of these event-pairs
> were being triggered across multiple contexts per container, across all
> containers at the rate of > 30 times per sec per context.
>
> Metrics: When running this workload without this patch, we measured an average
> of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10 seconds or
> ~10 million times over ~25+ mins. With this patch, the count reduced to ~480
> every 10 seconds or about ~28K over ~10 mins. The improvement observed is
> ~99% for the average counts per 10 seconds.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Acked-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Needs your s-o-b as you are posting the patch.

The code below looks to be the old rev of the patch? This still needs 
updating with the cleanup work?

John.

next prev parent reply	other threads:[~2022-07-28 20:20 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-28  5:51 [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context Alan Previn
2022-06-28  5:51 ` [Intel-gfx] [Intel-gfx 1/2] drm/i915/selftests: Use correct selfest calls for live tests Alan Previn
2022-07-28 19:45   ` John Harrison
2022-06-28  5:51 ` [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Alan Previn
2022-07-28 20:19   ` John Harrison [this message]
2022-06-28 20:19 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Delay disabling scheduling on a context (rev2) Patchwork
2022-06-28 20:19 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-06-28 20:40 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2022-08-14 22:43 [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context Alan Previn
2022-08-14 22:43 ` [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Alan Previn
2022-08-15 16:01 [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context Alan Previn
2022-08-15 16:01 ` [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Alan Previn
2022-08-15 23:57   ` John Harrison
2022-08-16  1:02     ` Teres Alexis, Alan Previn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e9ab17b2-8068-cdc8-8b77-15c39b0a80ba@intel.com \
    --to=john.c.harrison@intel.com \
    --cc=alan.previn.teres.alexis@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox