From mboxrd@z Thu Jan 1 00:00:00 1970 From: Francisco Jerez Subject: Re: [PATCH] drm/i915: Do not use iowait while waiting for the GPU Date: Sun, 29 Jul 2018 12:29:42 -0700 Message-ID: <87d0v5g7rt.fsf@riseup.net> References: <20180727184312.29937-1-chris@chris-wilson.co.uk> <87pnz8gcmr.fsf@riseup.net> <153278776733.24377.4575869668307623950@skylake-alporthouse-com> <87tvojf711.fsf@riseup.net> <153281151580.24377.10340169753397679886@skylake-alporthouse-com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0335238436==" Return-path: Received: from mx1.riseup.net (mx1.riseup.net [198.252.153.129]) by gabe.freedesktop.org (Postfix) with ESMTPS id 455356E156 for ; Sun, 29 Jul 2018 19:46:40 +0000 (UTC) In-Reply-To: <153281151580.24377.10340169753397679886@skylake-alporthouse-com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Chris Wilson , intel-gfx@lists.freedesktop.org Cc: Eero Tamminen List-Id: intel-gfx@lists.freedesktop.org --===============0335238436== Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" --==-=-= Content-Type: multipart/mixed; boundary="=-=-=" --=-=-= Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Chris Wilson writes: > Quoting Francisco Jerez (2018-07-28 21:18:50) >> Chris Wilson writes: >>=20 >> > Quoting Francisco Jerez (2018-07-28 06:20:12) >> >> Chris Wilson writes: >> >>=20 >> >> > A recent trend for cpufreq is to boost the CPU frequencies for >> >> > iowaiters, in particularly to benefit high frequency I/O. We do the= same >> >> > and boost the GPU clocks to try and minimise time spent waiting for= the >> >> > GPU. However, as the igfx and CPU share the same TDP, boosting the = CPU >> >> > frequency will result in the GPU being throttled and its frequency = being >> >> > reduced. Thus declaring iowait negatively impacts on GPU throughput. >> >> > >> >> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=3D107410 >> >> > References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost perform= ance on IO wakeup") >> >>=20 >> >> This patch causes up to ~13% performance regressions (with significan= ce >> >> 5%) on several latency-sensitive tests on my BXT: >> >>=20 >> >> jxrendermark/rendering-test=3DLinear Gradient Blend/rendering-size= =3D128x128: XXX =C2=B135.69% x53 -> XXX =C2=B132.57% x61 d=3D-13.52% = =C2=B131.88% p=3D2.58% >> > >>=20 >> The jxrendermark Linear Gradient Blend test-case had probably the >> smallest effect size of all the regressions I noticed... Can you take a >> look at any of the other ones instead? > > It was the biggest in the list, was it not? I didn't observe anything of > note in a quick look at x11perf, but didn't let it run for a good sample > size. They didn't seem to be as relevant as jxrendermark so I went and > dug that out. > That was the biggest regression in absolute value, but the smallest in effect size (roughly 0.4 standard deviations). >> > Curious, as this is just a bunch of composites and as with the others, >> > should never be latency sensitive (at least under bare X11). >>=20 >> They are largely latency-sensitive due to the poor pipelining they seem >> to achieve between their GPU rendering work and the X11 thread. > > Only the X11 thread is touching the GPU, and in the cases I looked at > it, we were either waiting for the ring to drain or on throttling. > Synchronisation with the GPU was only for draining the queue on timing, > and the cpu was able to stay ahead during the benchmark. > Apparently the CPU doesn't get ahead enough for the GPU to be consistently loaded, which prevents us from hiding the latency of the CPU computation even in those cases. > Off the top of my head, for X to be latency sensitive you need to mix > client and Xserver rendering, along the lines of Paint; GetImage, in the > extreme becoming gem_sync. Adding a compositor is also interesting for > the context switching will prevent us merging requests (but that all > depends on the frequency of compositor updates ofc), and we would > need more CPU and require reasonably low latency (less than the next > request) to keep the GPU busy. However, that is driven directly off > interrupts, iowait isn't a factor -- but your hook could still be useful > to provide pm_qos. > -Chris --=-=-=-- --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEAREIAB0WIQST8OekYz69PM20/4aDmTidfVK/WwUCW14VpgAKCRCDmTidfVK/ W8vuAP9xDOuwYgPfVWzd7d1NL4zOIFqt1GEKtuz+DdYUy6Z1PQEAn968fg785SjF sqHgqQ/qK0yGREh4f7A0Xofy64nXNtM= =iMX7 -----END PGP SIGNATURE----- --==-=-=-- --===============0335238436== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KSW50ZWwtZ2Z4 IG1haWxpbmcgbGlzdApJbnRlbC1nZnhAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vaW50ZWwtZ2Z4Cg== --===============0335238436==--