Re: [PATCH 4/5] drm/i915: Increase initial busyspin limit

From: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
To: "Mrozek, Michal" <michal.mrozek@intel.com>,
	"intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>,
	"Danecki, Jacek" <jacek.danecki@intel.com>,
	"chris@chris-wilson.co.uk" <chris@chris-wilson.co.uk>,
	"tvrtko.ursulin@linux.intel.com" <tvrtko.ursulin@linux.intel.com>
Cc: "ben@bwidawsk.net" <ben@bwidawsk.net>,
	"Tamminen, Eero T" <eero.t.tamminen@intel.com>
Subject: Re: [PATCH 4/5] drm/i915: Increase initial busyspin limit
Date: Thu, 2 Aug 2018 19:37:36 +0000	[thread overview]
Message-ID: <1533209648.9904.30.camel@intel.com> (raw)
In-Reply-To: <153322167016.23498.16578775043455440988@skylake-alporthouse-com>

On Thu, 2018-08-02 at 15:54 +0100, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-08-02 15:47:37)
> > 
> > On 28/07/2018 17:46, Chris Wilson wrote:
> > > Here we bump the busyspin for the current request to avoid
> > > sleeping and
> > > the cost of both idling and downclocking the CPU.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Sagar Kamble <sagar.a.kamble@intel.com>
> > > Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> > > Cc: Francisco Jerez <currojerez@riseup.net>
> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > Cc: Ben Widawsky <ben@bwidawsk.net>
> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > Cc: Michał Winiarski <michal.winiarski@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/Kconfig.profile | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/Kconfig.profile
> > > b/drivers/gpu/drm/i915/Kconfig.profile
> > > index de394dea4a14..54e0ba4051e7 100644
> > > --- a/drivers/gpu/drm/i915/Kconfig.profile
> > > +++ b/drivers/gpu/drm/i915/Kconfig.profile
> > > @@ -1,6 +1,6 @@
> > >   config DRM_I915_SPIN_REQUEST_IRQ
> > >       int
> > > -     default 5 # microseconds
> > > +     default 250 # microseconds
> > >       help
> > >         Before sleeping waiting for a request (GPU operation) to
> > > complete,
> > >         we may spend some time polling for its completion. As the
> > > IRQ may
> > > 
> > 
> > But the histograms from previous patch do not suggest 250us busy
> > spin 
> > gets us into interesting territory. And we have to know the
> > distribution 
> > between IRQ and CS spins.
> 
> This is for a different problem than presented in those histograms.
> This
> is to hide the reclocking that occurred if we sleep.
> 
> Previous justification for the initial spin was to try and hide the
> irq
> setup costs, this is to try and hide the sleep costs.
> 
> The purpose of this patch was just to dip the toe into the water of
> what
> is possible and solicit feedback since we are still waiting for the
> panacea patches to improve cpufreq. (Why it had such a lackluster
> justification.)

This patch somewhat resembles with the issue we just met in userspace
for the opencl and media. Long story short there was CPU% regression
between 2 opencl releases root caused to the following. OpenCL rewrote
the code how they wait for the task completion. Essentially they have
an optimistic spin implemented on the userspace side and logic their is
quite complex: they commit to different duration of the spin depending
on the scenario. And they had some pretty long spin, like 10ms. This is
a killer thing if opencl is coupled with media workloads, especially
when media is dominant. That's because usual execution time of media
batches is pretty long, few or dozen milliseconds is quite typical. As
a result opencl workload is really being submitted and executed with
significant delay, thus optimistic spin gives up and it comes with the
CPU% cost. This story was discussed here: https://github.com/intel/comp
ute-runtime/commit/d53e1c3979f6d822f55645bfcd753be1658f53ca#comments.

I was actually surprised why they don't rely on i915 wait function.
They responded in was: "I would gladly always use bo_wait ioctl, but it
has big overhead for short workloads." Thus the question: do we have an
opportunity to improve? Can we have some dynamic way to configure
spinning for the particular requests/contexts? /if I understand
correctly current patch it introduces kernel config-level setting
rather than some dynamic setting/

> -Chris
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx