From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ville =?iso-8859-1?Q?Syrj=E4l=E4?= Subject: Re: [PATCH 3/4] drm/i915: Update rules for writing through the LLC with the cpu Date: Tue, 6 Aug 2013 22:08:38 +0300 Message-ID: <20130806190838.GN5004@intel.com> References: <1375791425-14978-1-git-send-email-chris@chris-wilson.co.uk> <1375791425-14978-3-git-send-email-chris@chris-wilson.co.uk> <20130806142425.GH5004@intel.com> <20130806161614.GA6075@cantiga.alporthouse.com> <20130806180301.GL5004@intel.com> <20130806184540.GA8181@cantiga.alporthouse.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTP id C1EBFE5D33 for ; Tue, 6 Aug 2013 12:08:41 -0700 (PDT) Content-Disposition: inline In-Reply-To: <20130806184540.GA8181@cantiga.alporthouse.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Chris Wilson , intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org On Tue, Aug 06, 2013 at 07:45:40PM +0100, Chris Wilson wrote: > On Tue, Aug 06, 2013 at 09:03:01PM +0300, Ville Syrj=E4l=E4 wrote: > > On Tue, Aug 06, 2013 at 05:16:14PM +0100, Chris Wilson wrote: > > > On Tue, Aug 06, 2013 at 05:24:25PM +0300, Ville Syrj=E4l=E4 wrote: > > > > On Tue, Aug 06, 2013 at 01:17:04PM +0100, Chris Wilson wrote: > > > > > +static bool cpu_write_needs_clflush(struct drm_i915_gem_object *= obj) > > > > > +{ > > > > > + if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level)) > > > > > + return true; > > > > > + > > > > > + return atomic_read(&obj->fb_count) > 0; > > > > > +} > > > > = > > > > I was thinking a bit about this fb_count thing. The other option wo= uld > > > > be to track actual scanout activity, which wouldn't be hard to do, = but > > > > I guess that could shift a bunch of the clflush cost to page flip t= ime. > > > > So maybe we don't want that? > > > = > > > Different series, see the frontbuffer write tracking. As for page fli= p, > > > that is currently the only time we clflush... > > > = > > > > Also i915_gem_sw_finish_ioctl() still looks at pin_count as an > > > > indication whether scanout is happening. We should add an fb_count > > > > check there as well. Maybe we should leave the pin_count check ther= e, > > > > so that framebuffers that aren't being scanned out currently can be > > > > massaged a bit more freely before a clflush needs to be issued? > > > = > > > It does do a fb_count check, the pin_count + fb_count is a really str= ong > > > indicator of active scanout. > > = > > The fb_count checks is done only through the eventual check in > > i915_gem_clflush_object() which is why I didn't think of it. > > = > > On non-LLC platforms it also ends up doing the flush for all > > non-snooped non-scanout buffers, which would get optimized away if the > > fb_count check was where the pin_count check is. > = > But it does still do the fb_count check before the flush? What am I > missing here? I believe the code now ends up doing this (on non-LLC): if (pin_count && (cache_level =3D=3D UC || fb_count > 0) && write_domain = =3D=3D CPU) clflush(); So it'll flush also all pinned non-snooped buffers whether they be scanout or not. We could delay such flushes until the bo gets moved away from the CPU write domain. But maybe it's just a nonsense scenario. Why would someone write to a pinned bo unless the pinning is due to scanout? > > > > > - obj->tiling_mode =3D=3D I915_TILING_NONE && > > > > > - obj->base.write_domain !=3D I915_GEM_DOMAIN_CPU) { > > > > > + if (obj->tiling_mode =3D=3D I915_TILING_NONE && > > > > > + !cpu_write_needs_clflush(obj)) { > > > > = > > > > This would break non-LLC platforms, I think. Before, GTT writes were > > > > used for non-snooped buffers w/ clean CPU caches (write domain !=3D= cpu), > > > > which makes sense since access via GTT doesn't snoop ever according > > > > to the docs. Now I think it'll do GTT writes for snooped non-scanout > > > > buffers. > > > = > > > D'oh. What I was aiming for was to always use shmem writes on snooped > > > bo, irrespective of whether it is GPU hot. > > > = > > > s/!cpu_write_needs_clflush/cpu_write_needs_clflush/ > > = > > So that means a GTT write for anything non-snooped or scanout. If it's > > snooped scanout (I guess we may not really care about such a weird > > combination), or non-snooped but in CPU write domain, that would > > still have issues with possibly dirty data in the CPU caches. > = > Hmm, that should be fixed up by i915_gem_gtt_pwrite_fast() as we flush > all the caches before we start overwriting. I've thoroughly confused > myself over which paths do the fancy tricks. :( > = > > Might be intersting to try on real hardware what the GTT access > > snooping behaviour really is, especially for some modern non-LLC > > system like VLV. > = > I remember why I have the !DOMAIN_CPU test now. If we have already paid > the price to pull it back into the CPU domain, let it rest there until > it is used again. > = > So as it stands, I think we want: > = > if (obj->tiling_mode =3D=3D I915_TILING_NONE && > obj->base.write_domain !=3D I915_GEM_DOMAIN_CPU && > cpu_write_needs_clflush(obj)) > i915_gem_gtt_pwrite_fast(); Yeah that looks better to me at least. -- = Ville Syrj=E4l=E4 Intel OTC