* [PATCH 0/2] Support for creation of unbound WC user mappings
@ 2015-01-02 10:59 akash.goel
2015-01-02 10:59 ` [PATCH 1/2] drm/i915: Broaden application of set-domain(GTT) akash.goel
2015-01-02 10:59 ` [PATCH 2/2] drm/i915: Support creation of unbound wc user mappings for objects akash.goel
0 siblings, 2 replies; 4+ messages in thread
From: akash.goel @ 2015-01-02 10:59 UTC (permalink / raw)
To: intel-gfx; +Cc: daniel.vetter, Akash Goel
From: Akash Goel <akash.goel@intel.com>
This patch series provides support to create unbound write combining mapping
for GEM objects.
This is to improve the CPU write operation performance, as with such mapping,
writes are almost 50% faster than with mmap_gtt.
The new mapping is exercised by the IGT tests :- igt/gem_mmap_wc, igt/gem_tiled
igt/gem_concurrent_blit, igt/gem_gtt_speed & igt/gem_fence_upload.
Akash Goel (1):
drm/i915: Support creation of unbound wc user mappings for objects
Chris Wilson (1):
drm/i915: Broaden application of set-domain(GTT)
drivers/gpu/drm/i915/i915_dma.c | 3 ++
drivers/gpu/drm/i915/i915_gem.c | 62 +++++++++++++++++++++++++----------------
include/uapi/drm/i915_drm.h | 9 ++++++
3 files changed, 50 insertions(+), 24 deletions(-)
--
1.9.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/2] drm/i915: Broaden application of set-domain(GTT)
2015-01-02 10:59 [PATCH 0/2] Support for creation of unbound WC user mappings akash.goel
@ 2015-01-02 10:59 ` akash.goel
2015-01-02 10:59 ` [PATCH 2/2] drm/i915: Support creation of unbound wc user mappings for objects akash.goel
1 sibling, 0 replies; 4+ messages in thread
From: akash.goel @ 2015-01-02 10:59 UTC (permalink / raw)
To: intel-gfx; +Cc: daniel.vetter
From: Chris Wilson <chris@chris-wilson.co.uk>
Previously, this was restricted to only operate on bound objects - to
make pointer access through the GTT to the object coherent with writes
to and from the GPU. A second usecase is drm_intel_bo_wait_rendering()
which at present does not function unless the object also happens to
be bound into the GGTT (on current systems that is becoming increasingly
rare, especially for the typical requests from mesa). A third usecase is
a future patch wishing to extend the coverage of the GTT domain to
include objects not bound into the GGTT but still in its coherent cache
domain. For the latter pair of requests, we need to operate on the
object regardless of its bind state.
v2: After discussion with Akash, we came to the conclusion that the
get-pages was required in order for accurate domain tracking in the
corner cases (like the shrinker) and also useful for ensuring memory
coherency with earlier cached CPU mmaps in case userspace uses exotic
cache bypass (non-temporal) instructions.
v3: Fix the inactive object check.
v4: Rebase to latest drm-intel-nightly codebase
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Akash Goel <akash.goel@intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 43 ++++++++++++++++++-----------------------
1 file changed, 19 insertions(+), 24 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ac65386..ef4e424 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -153,12 +153,6 @@ int i915_mutex_lock_interruptible(struct drm_device *dev)
return 0;
}
-static inline bool
-i915_gem_object_is_inactive(struct drm_i915_gem_object *obj)
-{
- return i915_gem_obj_bound_any(obj) && !obj->active;
-}
-
int
i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
struct drm_file *file)
@@ -1487,18 +1481,10 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
if (ret)
goto unref;
- if (read_domains & I915_GEM_DOMAIN_GTT) {
+ if (read_domains & I915_GEM_DOMAIN_GTT)
ret = i915_gem_object_set_to_gtt_domain(obj, write_domain != 0);
-
- /* Silently promote "you're not bound, there was nothing to do"
- * to success, since the client was just asking us to
- * make sure everything was done.
- */
- if (ret == -EINVAL)
- ret = 0;
- } else {
+ else
ret = i915_gem_object_set_to_cpu_domain(obj, write_domain != 0);
- }
unref:
drm_gem_object_unreference(&obj->base);
@@ -3698,15 +3684,10 @@ i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj,
int
i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
{
- struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
- struct i915_vma *vma = i915_gem_obj_to_ggtt(obj);
uint32_t old_write_domain, old_read_domains;
+ struct i915_vma *vma;
int ret;
- /* Not valid to be called on unbound objects. */
- if (vma == NULL)
- return -EINVAL;
-
if (obj->base.write_domain == I915_GEM_DOMAIN_GTT)
return 0;
@@ -3715,6 +3696,19 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
return ret;
i915_gem_object_retire(obj);
+
+ /* Flush and acquire obj->pages so that we are coherent through
+ * direct access in memory with previous cached writes through
+ * shmemfs and that our cache domain tracking remains valid.
+ * For example, if the obj->filp was moved to swap without us
+ * being notified and releasing the pages, we would mistakenly
+ * continue to assume that the obj remained out of the CPU cached
+ * domain.
+ */
+ ret = i915_gem_object_get_pages(obj);
+ if (ret)
+ return ret;
+
i915_gem_object_flush_cpu_write_domain(obj, false);
/* Serialise direct access to this object with the barriers for
@@ -3746,9 +3740,10 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
old_write_domain);
/* And bump the LRU for this access */
- if (i915_gem_object_is_inactive(obj))
+ vma = i915_gem_obj_to_ggtt(obj);
+ if (vma && drm_mm_node_allocated(&vma->node) && !obj->active)
list_move_tail(&vma->mm_list,
- &dev_priv->gtt.base.inactive_list);
+ &to_i915(obj->base.dev)->gtt.base.inactive_list);
return 0;
}
--
1.9.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] drm/i915: Support creation of unbound wc user mappings for objects
2015-01-02 10:59 [PATCH 0/2] Support for creation of unbound WC user mappings akash.goel
2015-01-02 10:59 ` [PATCH 1/2] drm/i915: Broaden application of set-domain(GTT) akash.goel
@ 2015-01-02 10:59 ` akash.goel
2015-01-05 0:52 ` shuang.he
1 sibling, 1 reply; 4+ messages in thread
From: akash.goel @ 2015-01-02 10:59 UTC (permalink / raw)
To: intel-gfx; +Cc: daniel.vetter, Akash Goel
From: Akash Goel <akash.goel@intel.com>
This patch provides support to create write-combining virtual mappings of
GEM object. It intends to provide the same funtionality of 'mmap_gtt'
interface without the constraints and contention of a limited aperture
space, but requires clients handles the linear to tile conversion on their
own. This is for improving the CPU write operation performance, as with such
mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
flush after update from CPU side, when object is passed onto GPU. This
type of mapping is specially useful in case of sub-region update,
i.e. when only a portion of the object is to be updated. Using a CPU mmap
in such cases would normally incur a clflush of the whole object, and
using a GTT mmapping would likely require eviction of an active object or
fence and thus stall. The write-combining CPU mmap avoids both.
To ensure the cache coherency, before using this mapping, the GTT domain
has been reused here. This provides the required cache flush if the object
is in CPU domain or synchronization against the concurrent rendering.
Although the access through an uncached mmap should automatically
invalidate the cache lines, this may not be true for non-temporal write
instructions and also not all pages of the object may be updated at any
given point of time through this mapping. Having a call to get_pages in
set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
Broaden application of set-domain(GTT)', would guarantee the clflush and
so there will be no cachelines holding the data for the object before it
is accessed through this map.
The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface.
v2: Fix error handling, invalid flag detection, renaming (ickle)
v3: Rebase to latest drm-intel-nightly codebase
The new mmapping is exercised by igt/gem_mmap_wc,
igt/gem_concurrent_blit and igt/gem_gtt_speed.
Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
drivers/gpu/drm/i915/i915_dma.c | 3 +++
drivers/gpu/drm/i915/i915_gem.c | 19 +++++++++++++++++++
include/uapi/drm/i915_drm.h | 9 +++++++++
3 files changed, 31 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 52730ed..7fad6b8 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -143,6 +143,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
case I915_PARAM_HAS_COHERENT_PHYS_GTT:
value = 1;
break;
+ case I915_PARAM_MMAP_VERSION:
+ value = 1;
+ break;
default:
DRM_DEBUG("Unknown parameter %d\n", param->param);
return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ef4e424..aeeb617 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1549,6 +1549,12 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
struct drm_gem_object *obj;
unsigned long addr;
+ if (args->flags & ~(I915_MMAP_WC))
+ return -EINVAL;
+
+ if (args->flags & I915_MMAP_WC && !cpu_has_pat)
+ return -ENODEV;
+
obj = drm_gem_object_lookup(dev, file, args->handle);
if (obj == NULL)
return -ENOENT;
@@ -1564,6 +1570,19 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
addr = vm_mmap(obj->filp, 0, args->size,
PROT_READ | PROT_WRITE, MAP_SHARED,
args->offset);
+ if (args->flags & I915_MMAP_WC) {
+ struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma;
+
+ down_write(&mm->mmap_sem);
+ vma = find_vma(mm, addr);
+ if (vma)
+ vma->vm_page_prot =
+ pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
+ else
+ addr = -ENOMEM;
+ up_write(&mm->mmap_sem);
+ }
drm_gem_object_unreference_unlocked(obj);
if (IS_ERR((void *)addr))
return addr;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 2502622..c155a03 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -341,6 +341,7 @@ typedef struct drm_i915_irq_wait {
#define I915_PARAM_HAS_WT 27
#define I915_PARAM_CMD_PARSER_VERSION 28
#define I915_PARAM_HAS_COHERENT_PHYS_GTT 29
+#define I915_PARAM_MMAP_VERSION 30
typedef struct drm_i915_getparam {
int param;
@@ -488,6 +489,14 @@ struct drm_i915_gem_mmap {
* This is a fixed-size type for 32/64 compatibility.
*/
__u64 addr_ptr;
+
+ /**
+ * Flags for extended behaviour.
+ *
+ * Added in version 2.
+ */
+ __u64 flags;
+#define I915_MMAP_WC 0x1
};
struct drm_i915_gem_mmap_gtt {
--
1.9.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] drm/i915: Support creation of unbound wc user mappings for objects
2015-01-02 10:59 ` [PATCH 2/2] drm/i915: Support creation of unbound wc user mappings for objects akash.goel
@ 2015-01-05 0:52 ` shuang.he
0 siblings, 0 replies; 4+ messages in thread
From: shuang.he @ 2015-01-05 0:52 UTC (permalink / raw)
To: shuang.he, intel-gfx, akash.goel
Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
-------------------------------------Summary-------------------------------------
Platform Delta drm-intel-nightly Series Applied
PNV -1 363/364 362/364
ILK -18 364/366 346/366
SNB +3-1 443/450 445/450
IVB 496/498 496/498
BYT 288/289 288/289
HSW +5-4 542/564 543/564
BDW 415/417 415/417
-------------------------------------Detailed-------------------------------------
Platform Test drm-intel-nightly Series Applied
*PNV igt_gem_exec_big PASS(2, M25M23) FAIL(1, M23)
ILK igt_kms_flip_nonexisting-fb DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
ILK igt_kms_flip_rcs-flip-vs-panning-interruptible DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
ILK igt_kms_flip_rcs-wf_vblank-vs-dpms-interruptible DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
ILK igt_kms_render_direct-render DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
ILK igt_kms_flip_bcs-flip-vs-modeset-interruptible DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
ILK igt_kms_flip_blocking-absolute-wf_vblank-interruptible DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
ILK igt_kms_flip_busy-flip-interruptible DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
ILK igt_kms_flip_flip-vs-dpms-interruptible DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
ILK igt_kms_flip_flip-vs-panning DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
ILK igt_kms_flip_flip-vs-rmfb-interruptible DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
ILK igt_kms_flip_plain-flip-fb-recreate-interruptible DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
ILK igt_kms_flip_plain-flip-ts-check-interruptible DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
ILK igt_kms_flip_rcs-flip-vs-dpms DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
ILK igt_kms_flip_rcs-flip-vs-modeset DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
ILK igt_kms_flip_rcs-flip-vs-panning DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
ILK igt_kms_flip_vblank-vs-hang DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
ILK igt_kms_flip_wf_vblank-ts-check DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
ILK igt_kms_flip_wf_vblank-vs-modeset-interruptible DMESG_WARN(1, M26)PASS(1, M37) DMESG_WARN(1, M26)
*SNB igt_kms_flip_dpms-vs-vblank-race DMESG_WARN(2, M35M22) PASS(1, M22)
*SNB igt_kms_flip_dpms-vs-vblank-race-interruptible DMESG_WARN(2, M35M22) PASS(1, M22)
SNB igt_kms_flip_modeset-vs-vblank-race-interruptible DMESG_WARN(2, M35M22)PASS(1, M35) DMESG_WARN(1, M22)
SNB igt_kms_plane_plane-position-hole-pipe-B-plane-1 DMESG_WARN(1, M35)PASS(2, M35M22) PASS(1, M22)
*HSW igt_kms_flip_flip-vs-dpms-off-vs-modeset-interruptible DMESG_WARN(2, M40M19) PASS(1, M19)
*HSW igt_kms_flip_single-buffer-flip-vs-dpms-off-vs-modeset PASS(3, M40M19) DMESG_WARN(1, M19)
HSW igt_kms_flip_single-buffer-flip-vs-dpms-off-vs-modeset-interruptible DMESG_WARN(1, M40)PASS(1, M19) PASS(1, M19)
HSW igt_kms_plane_plane-panning-bottom-right-pipe-C-plane-1 TIMEOUT(2, M40)PASS(1, M19) PASS(1, M19)
*HSW igt_kms_plane_plane-panning-top-left-pipe-B-plane-2 PASS(2, M40M19) DMESG_FAIL(1, M19)
*HSW igt_kms_plane_plane-position-hole-pipe-B-plane-1 PASS(2, M40M19) DMESG_WARN(1, M19)
*HSW igt_kms_plane_plane-position-hole-pipe-B-plane-2 PASS(2, M40M19) TIMEOUT(1, M19)
*HSW igt_pm_rpm_modeset-non-lpsp-stress-no-wait NSPT(1, M19)DMESG_WARN(1, M40)PASS(1, M40) PASS(1, M19)
HSW igt_kms_flip_flip-vs-rmfb DMESG_WARN(1, M40)PASS(1, M19) PASS(1, M19)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-01-05 0:52 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-02 10:59 [PATCH 0/2] Support for creation of unbound WC user mappings akash.goel
2015-01-02 10:59 ` [PATCH 1/2] drm/i915: Broaden application of set-domain(GTT) akash.goel
2015-01-02 10:59 ` [PATCH 2/2] drm/i915: Support creation of unbound wc user mappings for objects akash.goel
2015-01-05 0:52 ` shuang.he
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox