From mboxrd@z Thu Jan 1 00:00:00 1970 From: Francisco Jerez Subject: Re: [PATCH] drm/i915: Do not use iowait while waiting for the GPU Date: Mon, 30 Jul 2018 11:55:07 -0700 Message-ID: <87fu00eepg.fsf@riseup.net> References: <20180727184312.29937-1-chris@chris-wilson.co.uk> <87pnz8gcmr.fsf@riseup.net> <153278776733.24377.4575869668307623950@skylake-alporthouse-com> <87tvojf711.fsf@riseup.net> <153281151580.24377.10340169753397679886@skylake-alporthouse-com> <87d0v5g7rt.fsf@riseup.net> <153295539046.10833.13897747091422977680@skylake-alporthouse-com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1625351100==" Return-path: Received: from mx1.riseup.net (mx1.riseup.net [198.252.153.129]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3E60F8961E for ; Mon, 30 Jul 2018 19:12:02 +0000 (UTC) In-Reply-To: <153295539046.10833.13897747091422977680@skylake-alporthouse-com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Chris Wilson , intel-gfx@lists.freedesktop.org Cc: Eero Tamminen List-Id: intel-gfx@lists.freedesktop.org --===============1625351100== Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" --==-=-= Content-Type: multipart/mixed; boundary="=-=-=" --=-=-= Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Chris Wilson writes: > Quoting Francisco Jerez (2018-07-29 20:29:42) >> Chris Wilson writes: >>=20 >> > Quoting Francisco Jerez (2018-07-28 21:18:50) >> >> Chris Wilson writes: >> >>=20 >> >> > Quoting Francisco Jerez (2018-07-28 06:20:12) >> >> >> Chris Wilson writes: >> >> >>=20 >> >> >> > A recent trend for cpufreq is to boost the CPU frequencies for >> >> >> > iowaiters, in particularly to benefit high frequency I/O. We do = the same >> >> >> > and boost the GPU clocks to try and minimise time spent waiting = for the >> >> >> > GPU. However, as the igfx and CPU share the same TDP, boosting t= he CPU >> >> >> > frequency will result in the GPU being throttled and its frequen= cy being >> >> >> > reduced. Thus declaring iowait negatively impacts on GPU through= put. >> >> >> > >> >> >> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=3D107410 >> >> >> > References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost perf= ormance on IO wakeup") >> >> >>=20 >> >> >> This patch causes up to ~13% performance regressions (with signifi= cance >> >> >> 5%) on several latency-sensitive tests on my BXT: >> >> >>=20 >> >> >> jxrendermark/rendering-test=3DLinear Gradient Blend/rendering-siz= e=3D128x128: XXX =C2=B135.69% x53 -> XXX =C2=B132.57% x61 d=3D-13.52% = =C2=B131.88% p=3D2.58% >> >> > >> >>=20 >> >> The jxrendermark Linear Gradient Blend test-case had probably the >> >> smallest effect size of all the regressions I noticed... Can you tak= e a >> >> look at any of the other ones instead? >> > >> > It was the biggest in the list, was it not? I didn't observe anything = of >> > note in a quick look at x11perf, but didn't let it run for a good samp= le >> > size. They didn't seem to be as relevant as jxrendermark so I went and >> > dug that out. >> > >>=20 >> That was the biggest regression in absolute value, but the smallest in >> effect size (roughly 0.4 standard deviations). > > d=3D-13.52% wasn't the delta between the two runs? > It is, less than half of 31.88% which is the pooled standard deviation. > Sorry, but it appears to be redacted beyond my comprehension. > >> >> > Curious, as this is just a bunch of composites and as with the othe= rs, >> >> > should never be latency sensitive (at least under bare X11). >> >>=20 >> >> They are largely latency-sensitive due to the poor pipelining they se= em >> >> to achieve between their GPU rendering work and the X11 thread. >> > >> > Only the X11 thread is touching the GPU, and in the cases I looked at >> > it, we were either waiting for the ring to drain or on throttling. >> > Synchronisation with the GPU was only for draining the queue on timing, >> > and the cpu was able to stay ahead during the benchmark. >> > >>=20 >> Apparently the CPU doesn't get ahead enough for the GPU to be >> consistently loaded, which prevents us from hiding the latency of the >> CPU computation even in those cases. > > The curse of reproducibility. On my bxt, I don't see the issue, so we > have a significant difference in setup. > -Chris --=-=-=-- --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEAREIAB0WIQST8OekYz69PM20/4aDmTidfVK/WwUCW19fDAAKCRCDmTidfVK/ W8EiAQCFnqdLigjWsh5dNupGsCNoXzo75/j1F6XsGUvxqEsYyQD/d18AMcvt/y7/ Hg/8XxAgL65/RU0zIAzSCJikBHcaRok= =7SUP -----END PGP SIGNATURE----- --==-=-=-- --===============1625351100== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KSW50ZWwtZ2Z4 IG1haWxpbmcgbGlzdApJbnRlbC1nZnhAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vaW50ZWwtZ2Z4Cg== --===============1625351100==--