* [CI 1/2] drm/i915: Support for creating write combined type vmaps
@ 2016-08-12 11:39 Chris Wilson
2016-08-12 11:39 ` [CI 2/2] drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory Chris Wilson
2016-08-12 12:04 ` ✗ Ro.CI.BAT: failure for series starting with [CI,1/2] drm/i915: Support for creating write combined type vmaps Patchwork
0 siblings, 2 replies; 5+ messages in thread
From: Chris Wilson @ 2016-08-12 11:39 UTC (permalink / raw)
To: intel-gfx
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 21 ++++++++++--
drivers/gpu/drm/i915/i915_gem.c | 58 +++++++++++++++++++++++++++------
drivers/gpu/drm/i915/i915_gem_dmabuf.c | 2 +-
drivers/gpu/drm/i915/intel_lrc.c | 8 ++---
drivers/gpu/drm/i915/intel_ringbuffer.c | 2 +-
5 files changed, 73 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7971c76852df..654aabe76efc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3144,13 +3144,20 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
obj->pages_pin_count--;
}
+enum i915_map_type {
+ I915_MAP_WB = 0,
+ I915_MAP_WC,
+};
+
/**
* i915_gem_object_pin_map - return a contiguous mapping of the entire object
* @obj - the object to map into kernel address space
+ * @type - the type of mapping, used to select pgprot_t
*
* Calls i915_gem_object_pin_pages() to prevent reaping of the object's
* pages and then returns a contiguous mapping of the backing storage into
- * the kernel address space.
+ * the kernel address space. Based on the @type of mapping, the PTE will be
+ * set to either WriteBack or WriteCombine (via pgprot_t).
*
* The caller must hold the struct_mutex, and is responsible for calling
* i915_gem_object_unpin_map() when the mapping is no longer required.
@@ -3158,7 +3165,8 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
* Returns the pointer through which to access the mapped object, or an
* ERR_PTR() on error.
*/
-void *__must_check i915_gem_object_pin_map(struct drm_i915_gem_object *obj);
+void *__must_check i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
+ enum i915_map_type type);
/**
* i915_gem_object_unpin_map - releases an earlier mapping
@@ -3899,4 +3907,13 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
return false;
}
+#define ptr_unpack_bits(ptr, bits) ({ \
+ unsigned long __v = (unsigned long)(ptr); \
+ (bits) = __v & ~PAGE_MASK; \
+ (typeof(ptr))(__v & PAGE_MASK); \
+})
+
+#define ptr_pack_bits(ptr, bits) \
+ ((typeof(ptr))((unsigned long)(ptr) | (bits)))
+
#endif
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 03548db54e6b..5566916870eb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2077,6 +2077,7 @@ i915_gem_object_put_pages(struct drm_i915_gem_object *obj)
list_del(&obj->global_list);
if (obj->mapping) {
+ /* low bits are ignored by is_vmalloc_addr and kmap_to_page */
if (is_vmalloc_addr(obj->mapping))
vunmap(obj->mapping);
else
@@ -2253,7 +2254,8 @@ i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
}
/* The 'mapping' part of i915_gem_object_pin_map() below */
-static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
+static void *i915_gem_object_map(const struct drm_i915_gem_object *obj,
+ enum i915_map_type type)
{
unsigned long n_pages = obj->base.size >> PAGE_SHIFT;
struct sg_table *sgt = obj->pages;
@@ -2262,10 +2264,11 @@ static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
struct page *stack_pages[32];
struct page **pages = stack_pages;
unsigned long i = 0;
+ pgprot_t pgprot;
void *addr;
/* A single page can always be kmapped */
- if (n_pages == 1)
+ if (n_pages == 1 && type == I915_MAP_WB)
return kmap(sg_page(sgt->sgl));
if (n_pages > ARRAY_SIZE(stack_pages)) {
@@ -2281,7 +2284,15 @@ static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
/* Check that we have the expected number of pages */
GEM_BUG_ON(i != n_pages);
- addr = vmap(pages, n_pages, 0, PAGE_KERNEL);
+ switch (type) {
+ case I915_MAP_WB:
+ pgprot = PAGE_KERNEL;
+ break;
+ case I915_MAP_WC:
+ pgprot = pgprot_writecombine(PAGE_KERNEL_IO);
+ break;
+ }
+ addr = vmap(pages, n_pages, 0, pgprot);
if (pages != stack_pages)
drm_free_large(pages);
@@ -2290,27 +2301,54 @@ static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
}
/* get, pin, and map the pages of the object into kernel space */
-void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj)
+void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
+ enum i915_map_type type)
{
+ enum i915_map_type has_type;
+ bool pinned;
+ void *ptr;
int ret;
lockdep_assert_held(&obj->base.dev->struct_mutex);
+ GEM_BUG_ON(!i915_gem_object_has_struct_page(obj));
ret = i915_gem_object_get_pages(obj);
if (ret)
return ERR_PTR(ret);
i915_gem_object_pin_pages(obj);
+ pinned = obj->pages_pin_count > 1;
- if (!obj->mapping) {
- obj->mapping = i915_gem_object_map(obj);
- if (!obj->mapping) {
- i915_gem_object_unpin_pages(obj);
- return ERR_PTR(-ENOMEM);
+ ptr = ptr_unpack_bits(obj->mapping, has_type);
+ if (ptr && has_type != type) {
+ if (pinned) {
+ ret = -EBUSY;
+ goto err;
}
+
+ if (is_vmalloc_addr(ptr))
+ vunmap(ptr);
+ else
+ kunmap(kmap_to_page(ptr));
+
+ ptr = obj->mapping = NULL;
}
- return obj->mapping;
+ if (!ptr) {
+ ptr = i915_gem_object_map(obj, type);
+ if (!ptr) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ obj->mapping = ptr_pack_bits(ptr, type);
+ }
+
+ return ptr;
+
+err:
+ i915_gem_object_unpin_pages(obj);
+ return ERR_PTR(ret);
}
static void
diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index c60a8d5bbad0..10265bb35604 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -119,7 +119,7 @@ static void *i915_gem_dmabuf_vmap(struct dma_buf *dma_buf)
if (ret)
return ERR_PTR(ret);
- addr = i915_gem_object_pin_map(obj);
+ addr = i915_gem_object_pin_map(obj, I915_MAP_WB);
mutex_unlock(&dev->struct_mutex);
return addr;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c7f4b64b16f6..c24ac39d51f6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -780,7 +780,7 @@ static int intel_lr_context_pin(struct i915_gem_context *ctx,
if (ret)
goto err;
- vaddr = i915_gem_object_pin_map(ce->state);
+ vaddr = i915_gem_object_pin_map(ce->state, I915_MAP_WB);
if (IS_ERR(vaddr)) {
ret = PTR_ERR(vaddr);
goto unpin_ctx_obj;
@@ -1755,7 +1755,7 @@ lrc_setup_hws(struct intel_engine_cs *engine,
/* The HWSP is part of the default context object in LRC mode. */
engine->status_page.gfx_addr = i915_gem_obj_ggtt_offset(dctx_obj) +
LRC_PPHWSP_PN * PAGE_SIZE;
- hws = i915_gem_object_pin_map(dctx_obj);
+ hws = i915_gem_object_pin_map(dctx_obj, I915_MAP_WB);
if (IS_ERR(hws))
return PTR_ERR(hws);
engine->status_page.page_addr = hws + LRC_PPHWSP_PN * PAGE_SIZE;
@@ -1968,7 +1968,7 @@ populate_lr_context(struct i915_gem_context *ctx,
return ret;
}
- vaddr = i915_gem_object_pin_map(ctx_obj);
+ vaddr = i915_gem_object_pin_map(ctx_obj, I915_MAP_WB);
if (IS_ERR(vaddr)) {
ret = PTR_ERR(vaddr);
DRM_DEBUG_DRIVER("Could not map object pages! (%d)\n", ret);
@@ -2189,7 +2189,7 @@ void intel_lr_context_reset(struct drm_i915_private *dev_priv,
if (!ctx_obj)
continue;
- vaddr = i915_gem_object_pin_map(ctx_obj);
+ vaddr = i915_gem_object_pin_map(ctx_obj, I915_MAP_WB);
if (WARN_ON(IS_ERR(vaddr)))
continue;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ed19868df9c6..75a9f635c3dc 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1951,7 +1951,7 @@ int intel_ring_pin(struct intel_ring *ring)
if (ret)
goto err_unpin;
- addr = i915_gem_object_pin_map(obj);
+ addr = i915_gem_object_pin_map(obj, I915_MAP_WB);
if (IS_ERR(addr)) {
ret = PTR_ERR(addr);
goto err_unpin;
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [CI 2/2] drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory
2016-08-12 11:39 [CI 1/2] drm/i915: Support for creating write combined type vmaps Chris Wilson
@ 2016-08-12 11:39 ` Chris Wilson
2016-08-12 12:30 ` Ville Syrjälä
2016-08-12 12:04 ` ✗ Ro.CI.BAT: failure for series starting with [CI,1/2] drm/i915: Support for creating write combined type vmaps Patchwork
1 sibling, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2016-08-12 11:39 UTC (permalink / raw)
To: intel-gfx
This patch provides the infrastructure for performing a 16-byte aligned
read from WC memory using non-temporal instructions introduced with sse4.1.
Using movntdqa we can bypass the CPU caches and read directly from memory
and ignoring the page attributes set on the CPU PTE i.e. negating the
impact of an otherwise UC access. Copying using movntdqa from WC is almost
as fast as reading from WB memory, modulo the possibility of both hitting
the CPU cache or leaving the data in the CPU cache for the next consumer.
(The CPU cache itself my be flushed for the region of the movntdqa and on
later access the movntdqa reads from a separate internal buffer for the
cacheline.) The write back to the memory is however cached.
This will be used in later patches to accelerate accessing WC memory.
v2: Report whether the accelerated copy is successful/possible.
v3: Function alignment override was only necessary when using the
function target("sse4.1") - which is not necessary for emitting movntdqa
from __asm__.
v4: Improve notes on CPU cache behaviour vs non-temporal stores.
v5: Fix byte offsets for unrolled moves.
v6: Find all remaining typos of "movntqda", use kernel_fpu_begin.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/Makefile | 3 ++
drivers/gpu/drm/i915/i915_drv.c | 2 +
drivers/gpu/drm/i915/i915_drv.h | 3 ++
drivers/gpu/drm/i915/i915_memcpy.c | 101 +++++++++++++++++++++++++++++++++++++
4 files changed, 109 insertions(+)
create mode 100644 drivers/gpu/drm/i915/i915_memcpy.c
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index dda724f04445..3412413408c0 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -3,12 +3,15 @@
# Direct Rendering Infrastructure (DRI) in XFree86 4.1.0 and higher.
subdir-ccflags-$(CONFIG_DRM_I915_WERROR) := -Werror
+subdir-ccflags-y += \
+ $(call as-instr,movntdqa (%eax)$(comma)%xmm0,-DCONFIG_AS_MOVNTDQA)
# Please keep these build lists sorted!
# core driver code
i915-y := i915_drv.o \
i915_irq.o \
+ i915_memcpy.o \
i915_params.o \
i915_pci.o \
i915_suspend.o \
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index d82f96b2a47e..13ae340ef1f3 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -827,6 +827,8 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv,
mutex_init(&dev_priv->wm.wm_mutex);
mutex_init(&dev_priv->pps_mutex);
+ i915_memcpy_init_early(dev_priv);
+
ret = i915_workqueues_init(dev_priv);
if (ret < 0)
return ret;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 654aabe76efc..bf193ba1574e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3907,6 +3907,9 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
return false;
}
+void i915_memcpy_init_early(struct drm_i915_private *dev_priv);
+bool i915_memcpy_from_wc(void *dst, const void *src, unsigned long len);
+
#define ptr_unpack_bits(ptr, bits) ({ \
unsigned long __v = (unsigned long)(ptr); \
(bits) = __v & ~PAGE_MASK; \
diff --git a/drivers/gpu/drm/i915/i915_memcpy.c b/drivers/gpu/drm/i915/i915_memcpy.c
new file mode 100644
index 000000000000..6f1df0ec8a81
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_memcpy.c
@@ -0,0 +1,101 @@
+/*
+ * Copyright © 2016 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <asm/fpu/api.h>
+
+#include "i915_drv.h"
+
+DEFINE_STATIC_KEY_FALSE(has_movntdqa);
+
+#ifdef CONFIG_AS_MOVNTDQA
+static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len)
+{
+ kernel_fpu_begin();
+
+ len >>= 4;
+ while (len >= 4) {
+ asm("movntdqa (%0), %%xmm0\n"
+ "movntdqa 16(%0), %%xmm1\n"
+ "movntdqa 32(%0), %%xmm2\n"
+ "movntdqa 48(%0), %%xmm3\n"
+ "movaps %%xmm0, (%1)\n"
+ "movaps %%xmm1, 16(%1)\n"
+ "movaps %%xmm2, 32(%1)\n"
+ "movaps %%xmm3, 48(%1)\n"
+ :: "r" (src), "r" (dst) : "memory");
+ src += 64;
+ dst += 64;
+ len -= 4;
+ }
+ while (len--) {
+ asm("movntdqa (%0), %%xmm0\n"
+ "movaps %%xmm0, (%1)\n"
+ :: "r" (src), "r" (dst) : "memory");
+ src += 16;
+ dst += 16;
+ }
+
+ kernel_fpu_end();
+}
+#endif
+
+/**
+ * i915_memcpy_from_wc: perform an accelerated *aligned* read from WC
+ * @dst: destination pointer
+ * @src: source pointer
+ * @len: how many bytes to copy
+ *
+ * i915_memcpy_from_wc copies @len bytes from @src to @dst using
+ * non-temporal instructions where available. Note that all arguments
+ * (@src, @dst) must be aligned to 16 bytes and @len must be a multiple
+ * of 16.
+ *
+ * To test whether accelerated reads from WC are supported, use
+ * i915_memcpy_from_wc(NULL, NULL, 0);
+ *
+ * Returns true if the copy was successful, false if the preconditions
+ * are not met.
+ */
+bool i915_memcpy_from_wc(void *dst, const void *src, unsigned long len)
+{
+ if (unlikely(((unsigned long)dst | (unsigned long)src | len) & 15))
+ return false;
+
+#ifdef CONFIG_AS_MOVNTDQA
+ if (static_branch_likely(&has_movntdqa)) {
+ if (likely(len))
+ __memcpy_ntdqa(dst, src, len);
+ return true;
+ }
+#endif
+
+ return false;
+}
+
+void i915_memcpy_init_early(struct drm_i915_private *dev_priv)
+{
+ if (static_cpu_has(X86_FEATURE_XMM4_1))
+ static_branch_enable(&has_movntdqa);
+}
--
2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 5+ messages in thread
* ✗ Ro.CI.BAT: failure for series starting with [CI,1/2] drm/i915: Support for creating write combined type vmaps
2016-08-12 11:39 [CI 1/2] drm/i915: Support for creating write combined type vmaps Chris Wilson
2016-08-12 11:39 ` [CI 2/2] drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory Chris Wilson
@ 2016-08-12 12:04 ` Patchwork
1 sibling, 0 replies; 5+ messages in thread
From: Patchwork @ 2016-08-12 12:04 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: series starting with [CI,1/2] drm/i915: Support for creating write combined type vmaps
URL : https://patchwork.freedesktop.org/series/11018/
State : failure
== Summary ==
Series 11018v1 Series without cover letter
http://patchwork.freedesktop.org/api/1.0/series/11018/revisions/1/mbox
Test kms_cursor_legacy:
Subgroup basic-cursor-vs-flip-varying-size:
fail -> PASS (ro-ilk1-i5-650)
Subgroup basic-flip-vs-cursor-legacy:
fail -> PASS (fi-hsw-i7-4770k)
fail -> PASS (ro-bdw-i5-5250u)
Subgroup basic-flip-vs-cursor-varying-size:
pass -> FAIL (ro-skl3-i5-6260u)
Test kms_pipe_crc_basic:
Subgroup suspend-read-crc-pipe-a:
dmesg-warn -> PASS (ro-bdw-i7-5600u)
Subgroup suspend-read-crc-pipe-b:
dmesg-warn -> SKIP (ro-bdw-i5-5250u)
fi-hsw-i7-4770k total:244 pass:222 dwarn:0 dfail:0 fail:0 skip:22
fi-kbl-qkkr total:183 pass:155 dwarn:2 dfail:0 fail:1 skip:24
fi-snb-i7-2600 total:244 pass:202 dwarn:0 dfail:0 fail:0 skip:42
ro-bdw-i5-5250u total:240 pass:219 dwarn:1 dfail:0 fail:1 skip:19
ro-bdw-i7-5600u total:240 pass:207 dwarn:0 dfail:0 fail:1 skip:32
ro-bsw-n3050 total:240 pass:194 dwarn:0 dfail:0 fail:4 skip:42
ro-hsw-i3-4010u total:240 pass:214 dwarn:0 dfail:0 fail:0 skip:26
ro-hsw-i7-4770r total:240 pass:185 dwarn:0 dfail:0 fail:0 skip:55
ro-ilk1-i5-650 total:235 pass:174 dwarn:0 dfail:0 fail:1 skip:60
ro-ivb-i7-3770 total:240 pass:205 dwarn:0 dfail:0 fail:0 skip:35
ro-ivb2-i7-3770 total:240 pass:209 dwarn:0 dfail:0 fail:0 skip:31
ro-skl3-i5-6260u total:240 pass:222 dwarn:0 dfail:0 fail:4 skip:14
fi-skl-i7-6700k failed to connect after reboot
ro-byt-n2820 failed to connect after reboot
Results at /archive/results/CI_IGT_test/RO_Patchwork_1853/
6b35e06 drm-intel-nightly: 2016y-08m-12d-11h-13m-26s UTC integration manifest
54fa4c9 drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory
5c3e921 drm/i915: Support for creating write combined type vmaps
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [CI 2/2] drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory
2016-08-12 11:39 ` [CI 2/2] drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory Chris Wilson
@ 2016-08-12 12:30 ` Ville Syrjälä
2016-08-12 12:42 ` Chris Wilson
0 siblings, 1 reply; 5+ messages in thread
From: Ville Syrjälä @ 2016-08-12 12:30 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
On Fri, Aug 12, 2016 at 12:39:59PM +0100, Chris Wilson wrote:
> This patch provides the infrastructure for performing a 16-byte aligned
> read from WC memory using non-temporal instructions introduced with sse4.1.
> Using movntdqa we can bypass the CPU caches and read directly from memory
> and ignoring the page attributes set on the CPU PTE i.e. negating the
> impact of an otherwise UC access. Copying using movntdqa from WC is almost
> as fast as reading from WB memory, modulo the possibility of both hitting
> the CPU cache or leaving the data in the CPU cache for the next consumer.
> (The CPU cache itself my be flushed for the region of the movntdqa and on
> later access the movntdqa reads from a separate internal buffer for the
> cacheline.) The write back to the memory is however cached.
>
> This will be used in later patches to accelerate accessing WC memory.
>
> v2: Report whether the accelerated copy is successful/possible.
> v3: Function alignment override was only necessary when using the
> function target("sse4.1") - which is not necessary for emitting movntdqa
> from __asm__.
> v4: Improve notes on CPU cache behaviour vs non-temporal stores.
> v5: Fix byte offsets for unrolled moves.
> v6: Find all remaining typos of "movntqda", use kernel_fpu_begin.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Akash Goel <akash.goel@intel.com>
> Cc: Damien Lespiau <damien.lespiau@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
> drivers/gpu/drm/i915/Makefile | 3 ++
> drivers/gpu/drm/i915/i915_drv.c | 2 +
> drivers/gpu/drm/i915/i915_drv.h | 3 ++
> drivers/gpu/drm/i915/i915_memcpy.c | 101 +++++++++++++++++++++++++++++++++++++
> 4 files changed, 109 insertions(+)
> create mode 100644 drivers/gpu/drm/i915/i915_memcpy.c
>
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index dda724f04445..3412413408c0 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -3,12 +3,15 @@
> # Direct Rendering Infrastructure (DRI) in XFree86 4.1.0 and higher.
>
> subdir-ccflags-$(CONFIG_DRM_I915_WERROR) := -Werror
> +subdir-ccflags-y += \
> + $(call as-instr,movntdqa (%eax)$(comma)%xmm0,-DCONFIG_AS_MOVNTDQA)
>
> # Please keep these build lists sorted!
>
> # core driver code
> i915-y := i915_drv.o \
> i915_irq.o \
> + i915_memcpy.o \
> i915_params.o \
> i915_pci.o \
> i915_suspend.o \
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index d82f96b2a47e..13ae340ef1f3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -827,6 +827,8 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv,
> mutex_init(&dev_priv->wm.wm_mutex);
> mutex_init(&dev_priv->pps_mutex);
>
> + i915_memcpy_init_early(dev_priv);
> +
> ret = i915_workqueues_init(dev_priv);
> if (ret < 0)
> return ret;
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 654aabe76efc..bf193ba1574e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3907,6 +3907,9 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
> return false;
> }
>
> +void i915_memcpy_init_early(struct drm_i915_private *dev_priv);
> +bool i915_memcpy_from_wc(void *dst, const void *src, unsigned long len);
> +
> #define ptr_unpack_bits(ptr, bits) ({ \
> unsigned long __v = (unsigned long)(ptr); \
> (bits) = __v & ~PAGE_MASK; \
> diff --git a/drivers/gpu/drm/i915/i915_memcpy.c b/drivers/gpu/drm/i915/i915_memcpy.c
> new file mode 100644
> index 000000000000..6f1df0ec8a81
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_memcpy.c
> @@ -0,0 +1,101 @@
> +/*
> + * Copyright © 2016 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#include <linux/kernel.h>
> +#include <asm/fpu/api.h>
> +
> +#include "i915_drv.h"
> +
> +DEFINE_STATIC_KEY_FALSE(has_movntdqa);
> +
> +#ifdef CONFIG_AS_MOVNTDQA
> +static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len)
> +{
> + kernel_fpu_begin();
> +
> + len >>= 4;
> + while (len >= 4) {
> + asm("movntdqa (%0), %%xmm0\n"
> + "movntdqa 16(%0), %%xmm1\n"
> + "movntdqa 32(%0), %%xmm2\n"
> + "movntdqa 48(%0), %%xmm3\n"
> + "movaps %%xmm0, (%1)\n"
> + "movaps %%xmm1, 16(%1)\n"
> + "movaps %%xmm2, 32(%1)\n"
> + "movaps %%xmm3, 48(%1)\n"
Not using sse2 movntdq for the store? No benefit or?
> + :: "r" (src), "r" (dst) : "memory");
> + src += 64;
> + dst += 64;
> + len -= 4;
> + }
> + while (len--) {
> + asm("movntdqa (%0), %%xmm0\n"
> + "movaps %%xmm0, (%1)\n"
> + :: "r" (src), "r" (dst) : "memory");
> + src += 16;
> + dst += 16;
> + }
> +
> + kernel_fpu_end();
> +}
> +#endif
> +
> +/**
> + * i915_memcpy_from_wc: perform an accelerated *aligned* read from WC
> + * @dst: destination pointer
> + * @src: source pointer
> + * @len: how many bytes to copy
> + *
> + * i915_memcpy_from_wc copies @len bytes from @src to @dst using
> + * non-temporal instructions where available. Note that all arguments
> + * (@src, @dst) must be aligned to 16 bytes and @len must be a multiple
> + * of 16.
> + *
> + * To test whether accelerated reads from WC are supported, use
> + * i915_memcpy_from_wc(NULL, NULL, 0);
> + *
> + * Returns true if the copy was successful, false if the preconditions
> + * are not met.
> + */
> +bool i915_memcpy_from_wc(void *dst, const void *src, unsigned long len)
> +{
> + if (unlikely(((unsigned long)dst | (unsigned long)src | len) & 15))
> + return false;
> +
> +#ifdef CONFIG_AS_MOVNTDQA
> + if (static_branch_likely(&has_movntdqa)) {
> + if (likely(len))
> + __memcpy_ntdqa(dst, src, len);
> + return true;
> + }
> +#endif
> +
> + return false;
> +}
> +
> +void i915_memcpy_init_early(struct drm_i915_private *dev_priv)
> +{
> + if (static_cpu_has(X86_FEATURE_XMM4_1))
> + static_branch_enable(&has_movntdqa);
> +}
> --
> 2.8.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [CI 2/2] drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory
2016-08-12 12:30 ` Ville Syrjälä
@ 2016-08-12 12:42 ` Chris Wilson
0 siblings, 0 replies; 5+ messages in thread
From: Chris Wilson @ 2016-08-12 12:42 UTC (permalink / raw)
To: Ville Syrjälä; +Cc: intel-gfx
On Fri, Aug 12, 2016 at 03:30:55PM +0300, Ville Syrjälä wrote:
> On Fri, Aug 12, 2016 at 12:39:59PM +0100, Chris Wilson wrote:
> > +#ifdef CONFIG_AS_MOVNTDQA
> > +static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len)
> > +{
> > + kernel_fpu_begin();
> > +
> > + len >>= 4;
> > + while (len >= 4) {
> > + asm("movntdqa (%0), %%xmm0\n"
> > + "movntdqa 16(%0), %%xmm1\n"
> > + "movntdqa 32(%0), %%xmm2\n"
> > + "movntdqa 48(%0), %%xmm3\n"
> > + "movaps %%xmm0, (%1)\n"
> > + "movaps %%xmm1, 16(%1)\n"
> > + "movaps %%xmm2, 32(%1)\n"
> > + "movaps %%xmm3, 48(%1)\n"
>
> Not using sse2 movntdq for the store? No benefit or?
At least in the scenarios we, ok I, have in mind, leaving the dst in the
cache benefits us as we immediately process/move the data on.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-08-12 12:42 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-12 11:39 [CI 1/2] drm/i915: Support for creating write combined type vmaps Chris Wilson
2016-08-12 11:39 ` [CI 2/2] drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory Chris Wilson
2016-08-12 12:30 ` Ville Syrjälä
2016-08-12 12:42 ` Chris Wilson
2016-08-12 12:04 ` ✗ Ro.CI.BAT: failure for series starting with [CI,1/2] drm/i915: Support for creating write combined type vmaps Patchwork
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.