Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: John Harrison <john.c.harrison@intel.com>
To: Alan Previn <alan.previn.teres.alexis@intel.com>,
	<intel-gfx@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero
Date: Thu, 28 Jul 2022 13:19:53 -0700	[thread overview]
Message-ID: <e9ab17b2-8068-cdc8-8b77-15c39b0a80ba@intel.com> (raw)
In-Reply-To: <20220628055130.1117146-3-alan.previn.teres.alexis@intel.com>

On 6/27/2022 22:51, Alan Previn wrote:
> From: Matthew Brost <matthew.brost@intel.com>
>
> Add a delay, configurable via debugs (default 100ms), to disable
debugs -> debugfs

Default is now 34ms?

> scheduling of a context after the pin count goes to zero. Disable
> scheduling is somewhat costly operation so the idea is a delay allows
costly operation as it requires synchronising with the GuC. So the idea

> the resubmit something before doing this operation. This delay is only
the user to resubmit

> done if the context isn't close and less than 3/4 of the guc_ids are in
close -> closed

less than a given threshold (default is 3/4) of the guc_ids

> use.
>
> As temporary WA disable this feature for the selftests. Selftests are
> very timing sensitive and any change in timing can cause failure. A
> follow up patch will fixup the selftests to understand this delay.
>
> Alan Previn: Matt Brost first introduced this series back in Oct 2021.
> However no real world workload with measured performance impact was
> available to prove the intended results. Today, this series is being
> republished in response to a real world workload that benefited greatly
> from it along with measured performance improvement.
>
> Workload description: 36 containers were created on a DG2 device where
> each container was performing a combination of 720p 3d game rendering
> and 30fps video encoding. The workload density was configured in way
> that guaranteed each container to ALWAYS be able to render and
> encode no less than 30fps with a predefined maximum render + encode
> latency time. That means that the totality of all 36 containers and its
> workloads were not saturating the utilized hw engines to its max
> (in order to maintain just enough headrooom to meet the minimum fps and
> latencies of incoming container submissions).
>
> Problem statement: It was observed that the CPU utilization of the CPU
> core that was pinned to i915 soft IRQ work was experiencing severe load.
> Using tracelogs and an instrumentation patch to count specific i915 IRQ
> events, it was confirmed that the majority of the CPU cycles were caused
> by the gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
> majority of the cycles was determined to be processing a specific G2H IRQ
> which was INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. This IRQ is send by
send -> sent

> the GuC in response to the i915 KMD sending the H2G requests
> INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET to the GuC. That request is sent
> when the context is idle to unpin the context from any GuC access. The
> high CPU utilization % symptom was limiting the density scaling.
>
> Root Cause Analysis: Because the incoming execution buffers were spread
> across 36 different containers (each with multiple contexts) but the
> system in totality was NOT saturated to the max, it was assumed that each
> context was constantly idling between submissions. This was causing thrashing
> of unpinning a context from GuC at one moment, followed by repinning it
> due to incoming workload the very next moment. Both of these event-pairs
> were being triggered across multiple contexts per container, across all
> containers at the rate of > 30 times per sec per context.
>
> Metrics: When running this workload without this patch, we measured an average
> of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10 seconds or
> ~10 million times over ~25+ mins. With this patch, the count reduced to ~480
> every 10 seconds or about ~28K over ~10 mins. The improvement observed is
> ~99% for the average counts per 10 seconds.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Acked-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Needs your s-o-b as you are posting the patch.

The code below looks to be the old rev of the patch? This still needs 
updating with the cleanup work?

John.


  reply	other threads:[~2022-07-28 20:20 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-28  5:51 [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context Alan Previn
2022-06-28  5:51 ` [Intel-gfx] [Intel-gfx 1/2] drm/i915/selftests: Use correct selfest calls for live tests Alan Previn
2022-07-28 19:45   ` John Harrison
2022-06-28  5:51 ` [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Alan Previn
2022-07-28 20:19   ` John Harrison [this message]
2022-06-28 20:19 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Delay disabling scheduling on a context (rev2) Patchwork
2022-06-28 20:19 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-06-28 20:40 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2022-08-14 22:43 [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context Alan Previn
2022-08-14 22:43 ` [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Alan Previn
2022-08-15 16:01 [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context Alan Previn
2022-08-15 16:01 ` [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Alan Previn
2022-08-15 23:57   ` John Harrison
2022-08-16  1:02     ` Teres Alexis, Alan Previn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e9ab17b2-8068-cdc8-8b77-15c39b0a80ba@intel.com \
    --to=john.c.harrison@intel.com \
    --cc=alan.previn.teres.alexis@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox