Re: [PATCH 4/4] drm/i915: Opportunistically reduce flushing at execbuf

From: Ben Widawsky <ben@bwidawsk.net>
To: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
Cc: Intel GFX <intel-gfx@lists.freedesktop.org>,
	DRI Development <dri-devel@lists.freedesktop.org>,
	Ben Widawsky <benjamin.widawsky@intel.com>
Subject: Re: [PATCH 4/4] drm/i915: Opportunistically reduce flushing at execbuf
Date: Sun, 14 Dec 2014 15:37:36 -0800	[thread overview]
Message-ID: <20141214233735.GA1497@bwidawsk.net> (raw)
In-Reply-To: <20141214131221.GD10649@intel.com>

On Sun, Dec 14, 2014 at 03:12:21PM +0200, Ville Syrjälä wrote:
> On Sat, Dec 13, 2014 at 07:08:24PM -0800, Ben Widawsky wrote:
> > If we're moving a bunch of buffers from the CPU domain to the GPU domain, and
> > we've already blown out the entire cache via a wbinvd, there is nothing more to
> > do.
> > 
> > With this and the previous patches, I am seeing a 3x FPS increase on a certain
> > benchmark which uses a giant 2d array texture. Unless I missed something in the
> > code, it should only effect non-LLC i915 platforms.
> > 
> > I haven't yet run any numbers for other benchmarks, nor have I attempted to
> > check if various conformance tests still pass.
> > 
> > NOTE: As mentioned in the previous patch, if one can easily obtain the largest
> > buffer and attempt to flush it first, the results would be even more desirable.
> 
> So even with that optimization if you only have tons of small buffers
> that need to be flushed you'd still take the clflush path for every
> single one.
> 
> How difficult would it to calculate the total size to be flushed first,
> and then make the clflush vs. wbinvd decision base on that?
> 

I'll write the patch and send it to Eero for test.

It's not hard, and I think that's a good idea as well. One reason I didn't put
such code in this series is that moves away from a global DRM solution (and like
I said in the cover-letter, I am fine with that). Implementing this, I think in
the i915 code we'd just iterate through the BOs until we got to a certain
threshold, then just call wbinvd() from i915 and not even both with drm_cache.
You could also maybe try to shorcut if there are more than X buffers.

However, for what you describe, I think it might make more sense to let
userspace specify an execbuf flag to do the wbinvd(). Userspace can trivially
determine such info, it prevents having to iterate through the buffers an extra
time in the kernel.

I wonder if the clflushing many small objects is showing up on profiles? So far,
this specific microbenchmark was the only profile I'd seen where the clflushes
show up.

Thanks.

[snip]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx