public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page
@ 2016-12-19  9:19 Tvrtko Ursulin
  2016-12-19  9:46 ` ✗ Fi.CI.BAT: warning for " Patchwork
  2016-12-19  9:47 ` [PATCH] " Joonas Lahtinen
  0 siblings, 2 replies; 5+ messages in thread
From: Tvrtko Ursulin @ 2016-12-19  9:19 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

For some reason GCC 6.2.1 here unrolls the from and to stack memcpy
here in per-byte fashion and also by repeatedly loading offset
constants. It look horrible like this for example:

      ...
     fdc:       48 b8 41 00 00 00 00    movabs rax,0xffff880000000041
     fe3:       88 ff ff
     fe6:       44 88 74 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r14b
     feb:       48 b8 42 00 00 00 00    movabs rax,0xffff880000000042
     ff2:       88 ff ff
     ff5:       44 88 6c 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r13b
     ffa:       48 b8 43 00 00 00 00    movabs rax,0xffff880000000043
    1001:       88 ff ff
    1004:       44 88 64 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r12b
    1009:       48 b8 44 00 00 00 00    movabs rax,0xffff880000000044
    1010:       88 ff ff
    1013:       88 5c 06 80             mov    BYTE PTR [rsi+rax*1-0x80],bl
    1017:       48 b8 45 00 00 00 00    movabs rax,0xffff880000000045
    101e:       88 ff ff
    1021:       44 88 5c 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r11b
    1026:       48 b8 46 00 00 00 00    movabs rax,0xffff880000000046
    102d:       88 ff ff
    1030:       44 88 54 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r10b
    1035:       48 b8 47 00 00 00 00    movabs rax,0xffff880000000047
    103c:       88 ff ff
    103f:       44 88 4c 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r9b
    1044:       0f b6 5d d0             movzx  ebx,BYTE PTR [rbp-0x30]
    1048:       48 b8 48 00 00 00 00    movabs rax,0xffff880000000048
    104f:       88 ff ff
    1052:       88 5c 06 80             mov    BYTE PTR [rsi+rax*1-0x80],bl
    1056:       48 b8 49 00 00 00 00    movabs rax,0xffff880000000049
    105d:       88 ff ff
    1060:       40 88 7c 06 80          mov    BYTE PTR [rsi+rax*1-0x80],dil
    1065:       0f b6 5d cf             movzx  ebx,BYTE PTR [rbp-0x31]
    1069:       48 b8 4a 00 00 00 00    movabs rax,0xffff88000000004a
    1070:       88 ff ff
    1073:       88 5c 06 80             mov    BYTE PTR [rsi+rax*1-0x80],bl
    1077:       0f b6 7d ce             movzx  edi,BYTE PTR [rbp-0x32]
    107b:       48 b8 4b 00 00 00 00    movabs rax,0xffff88000000004b
      ...

So change the code a bit which makes it generate a more reasonable
code like:
  ...
 bf1:   48 89 78 b8             mov    QWORD PTR [rax-0x48],rdi
 bf5:   4c 89 60 c0             mov    QWORD PTR [rax-0x40],r12
 bf9:   48 89 58 c8             mov    QWORD PTR [rax-0x38],rbx
 bfd:   4c 89 58 d0             mov    QWORD PTR [rax-0x30],r11
 c01:   4c 89 50 d8             mov    QWORD PTR [rax-0x28],r10
  ...

Which saves 2087 bytes of code.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_fence_reg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
index e03983973252..d665d2e74641 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
@@ -631,9 +631,9 @@ i915_gem_swizzle_page(struct page *page)
 	vaddr = kmap(page);
 
 	for (i = 0; i < PAGE_SIZE; i += 128) {
-		memcpy(temp, &vaddr[i], 64);
+		memcpy(&temp[0], &vaddr[i], 64);
 		memcpy(&vaddr[i], &vaddr[i + 64], 64);
-		memcpy(&vaddr[i + 64], temp, 64);
+		memcpy(&vaddr[i + 64], &temp[0], 64);
 	}
 
 	kunmap(page);
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-12-20  9:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-19  9:19 [PATCH] drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page Tvrtko Ursulin
2016-12-19  9:46 ` ✗ Fi.CI.BAT: warning for " Patchwork
2016-12-19  9:47 ` [PATCH] " Joonas Lahtinen
2016-12-19 10:32   ` Jani Nikula
2016-12-20  9:48     ` Tvrtko Ursulin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox