From mboxrd@z Thu Jan 1 00:00:00 1970 From: Francisco Jerez Subject: Re: [PATCH] drm/i915: Do not use iowait while waiting for the GPU Date: Fri, 27 Jul 2018 22:20:12 -0700 Message-ID: <87pnz8gcmr.fsf@riseup.net> References: <20180727184312.29937-1-chris@chris-wilson.co.uk> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0295949373==" Return-path: Received: from mx1.riseup.net (mx1.riseup.net [198.252.153.129]) by gabe.freedesktop.org (Postfix) with ESMTPS id 58BC56EB15 for ; Sat, 28 Jul 2018 05:37:09 +0000 (UTC) In-Reply-To: <20180727184312.29937-1-chris@chris-wilson.co.uk> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Chris Wilson , intel-gfx@lists.freedesktop.org Cc: Eero Tamminen List-Id: intel-gfx@lists.freedesktop.org --===============0295949373== Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" --==-=-= Content-Type: multipart/mixed; boundary="=-=-=" --=-=-= Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Chris Wilson writes: > A recent trend for cpufreq is to boost the CPU frequencies for > iowaiters, in particularly to benefit high frequency I/O. We do the same > and boost the GPU clocks to try and minimise time spent waiting for the > GPU. However, as the igfx and CPU share the same TDP, boosting the CPU > frequency will result in the GPU being throttled and its frequency being > reduced. Thus declaring iowait negatively impacts on GPU throughput. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=3D107410 > References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance o= n IO wakeup") This patch causes up to ~13% performance regressions (with significance 5%) on several latency-sensitive tests on my BXT: jxrendermark/rendering-test=3DLinear Gradient Blend/rendering-size=3D128x1= 28: XXX =C2=B135.69% x53 -> XXX =C2=B132.57% x61 d=3D-13.52% =C2=B131.8= 8% p=3D2.58% jxrendermark/rendering-test=3DTransformed Blit Bilinear/rendering-size=3D1= 28x128: XXX =C2=B13.51% x21 -> XXX =C2=B13.77% x21 d=3D-12.08% =C2=B13.4= 1% p=3D0.00% gtkperf/gtk-test=3DGtkComboBox: = XXX =C2=B11.90% x19 -> XXX =C2=B11.59% x20 d=3D-4.74% =C2=B11.71%= p=3D0.00% x11perf/test=3D500px Compositing From Pixmap To Window: = XXX =C2=B12.35% x21 -> XXX =C2=B11.73% x21 d=3D-2.69% =C2=B12.04%= p=3D0.01% qgears2/render-backend=3DXRender Extension/test-mode=3DText: = XXX =C2=B10.38% x21 -> XXX =C2=B10.40% x25 d=3D-2.20% =C2=B10.3= 8% p=3D0.00% x11perf/test=3D500px Compositing From Pixmap To Window: = XXX =C2=B12.78% x53 -> XXX =C2=B12.27% x61 d=3D-1.77% =C2=B12.50%= p=3D0.03% It's unsurprising to see latency-sensitive workloads relying on the lower latency offered by io_schedule_timeout(), since the CPUFREQ governor will have substantial downward bias without it, in response to the intermittent CPU usage pattern of those benchmarks. We could possibly have the best from both worlds if the CPUFREQ governor didn't attempt to EPP-boost the CPU frequency on IOWAIT while the system is heavily IO-bound, since the occurrence of both conditions simultaneously indicates the CPU workload is also likely to be IO-bound and its performance will remain unchanged while boosting the CPU frequency, so it can only pessimize the performance of the system. This could be achieved by using the statistic implemented here [1]. I think the offending patch should probably be reverted for the time being... [1] https://patchwork.kernel.org/patch/10312259/ > Signed-off-by: Chris Wilson > Cc: Tvrtko Ursulin > Cc: Joonas Lahtinen > Cc: Eero Tamminen > Cc: Francisco Jerez > --- > drivers/gpu/drm/i915/i915_request.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i= 915_request.c > index 5c2c93cbab12..7ef7ade12073 100644 > --- a/drivers/gpu/drm/i915/i915_request.c > +++ b/drivers/gpu/drm/i915/i915_request.c > @@ -1330,7 +1330,7 @@ long i915_request_wait(struct i915_request *rq, > goto complete; > } >=20=20 > - timeout =3D io_schedule_timeout(timeout); > + timeout =3D schedule_timeout(timeout); > } while (1); >=20=20 > GEM_BUG_ON(!intel_wait_has_seqno(&wait)); > --=20 > 2.18.0 --=-=-=-- --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEAREIAB0WIQST8OekYz69PM20/4aDmTidfVK/WwUCW1v9DAAKCRCDmTidfVK/ WwX3AP4uXaw+CN9LuxmTz2dbHbDx4CqIJZDmM54utio3FKIQYQD8DBQffKdGfF7w QQqjOeFrwH6Wz+B+6bolHNWdWBSjNvE= =sQ3X -----END PGP SIGNATURE----- --==-=-=-- --===============0295949373== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KSW50ZWwtZ2Z4 IG1haWxpbmcgbGlzdApJbnRlbC1nZnhAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vaW50ZWwtZ2Z4Cg== --===============0295949373==--