From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 3/4] drm/i915: Update rules for writing through the LLC with the cpu
Date: Tue, 6 Aug 2013 17:24:25 +0300 [thread overview]
Message-ID: <20130806142425.GH5004@intel.com> (raw)
In-Reply-To: <1375791425-14978-3-git-send-email-chris@chris-wilson.co.uk>
On Tue, Aug 06, 2013 at 01:17:04PM +0100, Chris Wilson wrote:
> As mentioned in the previous commit, reads and writes from both the CPU
> and GPU go through the LLC. This gives us coherency between the CPU and
> GPU irrespective of the attribute settings either device sets. We can
> use to avoid having to clflush even uncached memory.
>
> Except for the scanout.
>
> The scanout resides within another functional block that does not use
> the LLC but reads directly from main memory. So in order to maintain
> coherency with the scanout, writes to uncached memory must be flushed.
> In order to optimize writes elsewhere, we start tracking whether an
> framebuffer is attached to an object.
>
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 3 +++
> drivers/gpu/drm/i915/i915_gem.c | 27 +++++++++++++++++----------
> drivers/gpu/drm/i915/intel_display.c | 3 +++
> 3 files changed, 23 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index aa11731..93c4789 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1407,6 +1407,9 @@ struct drm_i915_gem_object {
> /** for phy allocated objects */
> struct drm_i915_gem_phys_object *phys_obj;
>
> + /** Track framebuffers associated with this object */
> + atomic_t fb_count;
> +
> union {
> struct i915_gem_userptr {
> uintptr_t ptr;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 5671dab..9805693 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -66,6 +66,14 @@ static bool cpu_cache_is_coherent(struct drm_device *dev,
> return HAS_LLC(dev) || level != I915_CACHE_NONE;
> }
>
> +static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> +{
> + if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
> + return true;
> +
> + return atomic_read(&obj->fb_count) > 0;
> +}
I was thinking a bit about this fb_count thing. The other option would
be to track actual scanout activity, which wouldn't be hard to do, but
I guess that could shift a bunch of the clflush cost to page flip time.
So maybe we don't want that?
Also i915_gem_sw_finish_ioctl() still looks at pin_count as an
indication whether scanout is happening. We should add an fb_count
check there as well. Maybe we should leave the pin_count check there,
so that framebuffers that aren't being scanned out currently can be
massaged a bit more freely before a clflush needs to be issued?
> +
> static inline void i915_gem_object_fence_lost(struct drm_i915_gem_object *obj)
> {
> if (obj->tiling_mode)
> @@ -832,8 +840,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
> * write domain and manually flush cachelines (if required). This
> * optimizes for the case when the gpu will use the data
> * right away and we therefore have to clflush anyway. */
> - if (obj->cache_level == I915_CACHE_NONE)
> - needs_clflush_after = 1;
> + needs_clflush_after = cpu_write_needs_clflush(obj);
> if (i915_gem_obj_ggtt_bound(obj)) {
> ret = i915_gem_object_set_to_gtt_domain(obj, true);
> if (ret)
> @@ -1001,9 +1008,8 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
> goto out;
> }
>
> - if (obj->cache_level == I915_CACHE_NONE &&
> - obj->tiling_mode == I915_TILING_NONE &&
> - obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> + if (obj->tiling_mode == I915_TILING_NONE &&
> + !cpu_write_needs_clflush(obj)) {
This would break non-LLC platforms, I think. Before, GTT writes were
used for non-snooped buffers w/ clean CPU caches (write domain != cpu),
which makes sense since access via GTT doesn't snoop ever according
to the docs. Now I think it'll do GTT writes for snooped non-scanout
buffers.
> ret = i915_gem_gtt_pwrite_fast(dev, obj, args, file);
> /* Note that the gtt paths might fail with non-page-backed user
> * pointers (e.g. gtt mappings when moving data between
> @@ -3299,7 +3305,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj)
> * snooping behaviour occurs naturally as the result of our domain
> * tracking.
> */
> - if (obj->cache_level != I915_CACHE_NONE)
> + if (!cpu_write_needs_clflush(obj))
> return;
I would actually prefer to kill this check completely, and move the
decision whether to do a clflush to the callers.
There are three places that I spotted which lack such a check;
i915_gem_execbuffer_move_to_gpu() and i915_gem_object_set_to_cpu_domain()
need to flush only when !cpu_cache_is_coherent(), whereas
i915_gem_object_flush_cpu_write_domain() needs to flush for scanout as
well.
>
> trace_i915_gem_object_clflush(obj);
> @@ -3462,7 +3468,11 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> obj, cache_level);
> }
>
> - if (cache_level == I915_CACHE_NONE) {
> + list_for_each_entry(vma, &obj->vma_list, vma_link)
> + vma->node.color = cache_level;
> + obj->cache_level = cache_level;
> +
> + if (cpu_write_needs_clflush(obj)) {
> u32 old_read_domains, old_write_domain;
>
> /* If we're coming from LLC cached, then we haven't
> @@ -3485,9 +3495,6 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> old_write_domain);
> }
>
> - list_for_each_entry(vma, &obj->vma_list, vma_link)
> - vma->node.color = cache_level;
> - obj->cache_level = cache_level;
> i915_gem_verify_gtt(dev);
> return 0;
> }
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 0584480..cc5eaba 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -9443,6 +9443,8 @@ static void intel_user_framebuffer_destroy(struct drm_framebuffer *fb)
> struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb);
>
> drm_framebuffer_cleanup(fb);
> +
> + atomic_dec(&intel_fb->obj->fb_count);
Where did the the other atomic_dec() go (was in intel_fbdev_destroy())?
> drm_gem_object_unreference_unlocked(&intel_fb->obj->base);
>
> kfree(intel_fb);
> @@ -9568,6 +9570,7 @@ int intel_framebuffer_init(struct drm_device *dev,
> return ret;
> }
>
> + atomic_inc(&obj->fb_count);
> return 0;
> }
>
> --
> 1.8.4.rc1
--
Ville Syrjälä
Intel OTC
next prev parent reply other threads:[~2013-08-06 14:24 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-06 12:17 [PATCH 1/4] drm/i915: Rename I915_CACHE_MLC_LLC to L3_LLC for Ivybridge Chris Wilson
2013-08-06 12:17 ` [PATCH 2/4] drm/i915: Update rules for reading cache lines through the LLC Chris Wilson
2013-08-06 13:31 ` Daniel Vetter
2013-08-06 16:20 ` Chris Wilson
2013-08-06 12:17 ` [PATCH 3/4] drm/i915: Update rules for writing through the LLC with the cpu Chris Wilson
2013-08-06 14:24 ` Ville Syrjälä [this message]
2013-08-06 16:16 ` Chris Wilson
2013-08-06 18:03 ` Ville Syrjälä
2013-08-06 18:45 ` Chris Wilson
2013-08-06 19:08 ` Ville Syrjälä
2013-08-06 19:26 ` Chris Wilson
2013-08-06 12:17 ` [PATCH 4/4] drm/i915: Only do a chipset flush after a clflush Chris Wilson
2013-08-06 14:25 ` [PATCH 1/4] drm/i915: Rename I915_CACHE_MLC_LLC to L3_LLC for Ivybridge Ville Syrjälä
2013-08-06 14:36 ` Daniel Vetter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130806142425.GH5004@intel.com \
--to=ville.syrjala@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.