Re: [PATCH 3/4] drm/i915: Update rules for writing through the LLC with the cpu

intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 3/4] drm/i915: Update rules for writing through the LLC with the cpu
Date: Tue, 6 Aug 2013 17:24:25 +0300	[thread overview]
Message-ID: <20130806142425.GH5004@intel.com> (raw)
In-Reply-To: <1375791425-14978-3-git-send-email-chris@chris-wilson.co.uk>

On Tue, Aug 06, 2013 at 01:17:04PM +0100, Chris Wilson wrote:
> As mentioned in the previous commit, reads and writes from both the CPU
> and GPU go through the LLC. This gives us coherency between the CPU and
> GPU irrespective of the attribute settings either device sets. We can
> use to avoid having to clflush even uncached memory.
> 
> Except for the scanout.
> 
> The scanout resides within another functional block that does not use
> the LLC but reads directly from main memory. So in order to maintain
> coherency with the scanout, writes to uncached memory must be flushed.
> In order to optimize writes elsewhere, we start tracking whether an
> framebuffer is attached to an object.
> 
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_drv.h      |  3 +++
>  drivers/gpu/drm/i915/i915_gem.c      | 27 +++++++++++++++++----------
>  drivers/gpu/drm/i915/intel_display.c |  3 +++
>  3 files changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index aa11731..93c4789 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1407,6 +1407,9 @@ struct drm_i915_gem_object {
>  	/** for phy allocated objects */
>  	struct drm_i915_gem_phys_object *phys_obj;
>  
> +	/** Track framebuffers associated with this object */
> +	atomic_t fb_count;
> +
>  	union {
>  		struct i915_gem_userptr {
>  			uintptr_t ptr;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 5671dab..9805693 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -66,6 +66,14 @@ static bool cpu_cache_is_coherent(struct drm_device *dev,
>  	return HAS_LLC(dev) || level != I915_CACHE_NONE;
>  }
>  
> +static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> +{
> +	if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
> +		return true;
> +
> +	return atomic_read(&obj->fb_count) > 0;
> +}

I was thinking a bit about this fb_count thing. The other option would
be to track actual scanout activity, which wouldn't be hard to do, but
I guess that could shift a bunch of the clflush cost to page flip time.
So maybe we don't want that?

Also i915_gem_sw_finish_ioctl() still looks at pin_count as an
indication whether scanout is happening. We should add an fb_count
check there as well. Maybe we should leave the pin_count check there,
so that framebuffers that aren't being scanned out currently can be
massaged a bit more freely before a clflush needs to be issued?

> +
>  static inline void i915_gem_object_fence_lost(struct drm_i915_gem_object *obj)
>  {
>  	if (obj->tiling_mode)
> @@ -832,8 +840,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
>  		 * write domain and manually flush cachelines (if required). This
>  		 * optimizes for the case when the gpu will use the data
>  		 * right away and we therefore have to clflush anyway. */
> -		if (obj->cache_level == I915_CACHE_NONE)
> -			needs_clflush_after = 1;
> +		needs_clflush_after = cpu_write_needs_clflush(obj);
>  		if (i915_gem_obj_ggtt_bound(obj)) {
>  			ret = i915_gem_object_set_to_gtt_domain(obj, true);
>  			if (ret)
> @@ -1001,9 +1008,8 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
>  		goto out;
>  	}
>  
> -	if (obj->cache_level == I915_CACHE_NONE &&
> -	    obj->tiling_mode == I915_TILING_NONE &&
> -	    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> +	if (obj->tiling_mode == I915_TILING_NONE &&
> +	    !cpu_write_needs_clflush(obj)) {

This would break non-LLC platforms, I think. Before, GTT writes were
used for non-snooped buffers w/ clean CPU caches (write domain != cpu),
which makes sense since access via GTT doesn't snoop ever according
to the docs. Now I think it'll do GTT writes for snooped non-scanout
buffers.

>  		ret = i915_gem_gtt_pwrite_fast(dev, obj, args, file);
>  		/* Note that the gtt paths might fail with non-page-backed user
>  		 * pointers (e.g. gtt mappings when moving data between
> @@ -3299,7 +3305,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj)
>  	 * snooping behaviour occurs naturally as the result of our domain
>  	 * tracking.
>  	 */
> -	if (obj->cache_level != I915_CACHE_NONE)
> +	if (!cpu_write_needs_clflush(obj))
>  		return;

I would actually prefer to kill this check completely, and move the
decision whether to do a clflush to the callers.

There are three places that I spotted which lack such a check;
i915_gem_execbuffer_move_to_gpu() and i915_gem_object_set_to_cpu_domain()
need to flush only when !cpu_cache_is_coherent(), whereas
i915_gem_object_flush_cpu_write_domain() needs to flush for scanout as
well.

>  
>  	trace_i915_gem_object_clflush(obj);
> @@ -3462,7 +3468,11 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>  					       obj, cache_level);
>  	}
>  
> -	if (cache_level == I915_CACHE_NONE) {
> +	list_for_each_entry(vma, &obj->vma_list, vma_link)
> +		vma->node.color = cache_level;
> +	obj->cache_level = cache_level;
> +
> +	if (cpu_write_needs_clflush(obj)) {
>  		u32 old_read_domains, old_write_domain;
>  
>  		/* If we're coming from LLC cached, then we haven't
> @@ -3485,9 +3495,6 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>  						    old_write_domain);
>  	}
>  
> -	list_for_each_entry(vma, &obj->vma_list, vma_link)
> -		vma->node.color = cache_level;
> -	obj->cache_level = cache_level;
>  	i915_gem_verify_gtt(dev);
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 0584480..cc5eaba 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -9443,6 +9443,8 @@ static void intel_user_framebuffer_destroy(struct drm_framebuffer *fb)
>  	struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb);
>  
>  	drm_framebuffer_cleanup(fb);
> +
> +	atomic_dec(&intel_fb->obj->fb_count);

Where did the the other atomic_dec() go (was in intel_fbdev_destroy())?

>  	drm_gem_object_unreference_unlocked(&intel_fb->obj->base);
>  
>  	kfree(intel_fb);
> @@ -9568,6 +9570,7 @@ int intel_framebuffer_init(struct drm_device *dev,
>  		return ret;
>  	}
>  
> +	atomic_inc(&obj->fb_count);
>  	return 0;
>  }
>  
> -- 
> 1.8.4.rc1

-- 
Ville Syrjälä
Intel OTC

next prev parent reply	other threads:[~2013-08-06 14:24 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-06 12:17 [PATCH 1/4] drm/i915: Rename I915_CACHE_MLC_LLC to L3_LLC for Ivybridge Chris Wilson
2013-08-06 12:17 ` [PATCH 2/4] drm/i915: Update rules for reading cache lines through the LLC Chris Wilson
2013-08-06 13:31   ` Daniel Vetter
2013-08-06 16:20     ` Chris Wilson
2013-08-06 12:17 ` [PATCH 3/4] drm/i915: Update rules for writing through the LLC with the cpu Chris Wilson
2013-08-06 14:24   ` Ville Syrjälä [this message]
2013-08-06 16:16     ` Chris Wilson
2013-08-06 18:03       ` Ville Syrjälä
2013-08-06 18:45         ` Chris Wilson
2013-08-06 19:08           ` Ville Syrjälä
2013-08-06 19:26             ` Chris Wilson
2013-08-06 12:17 ` [PATCH 4/4] drm/i915: Only do a chipset flush after a clflush Chris Wilson
2013-08-06 14:25 ` [PATCH 1/4] drm/i915: Rename I915_CACHE_MLC_LLC to L3_LLC for Ivybridge Ville Syrjälä
2013-08-06 14:36   ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130806142425.GH5004@intel.com \
    --to=ville.syrjala@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).