linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* updates for be8 patch series
@ 2013-07-22 16:33 Ben Dooks
  2013-07-22 16:49 ` Ben Dooks
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Dooks @ 2013-07-22 16:33 UTC (permalink / raw)
  To: linux-arm-kernel

I've updated the BE8 patch series after running further tests on the
kernel. I found an issue with the alignment handler (and I believe the
same issue with the trap handler).

I also ran tests with the module loader, which shows that if we are
in BE8 that it needs to fix up the relocations. Also the modules need
to be built --be8 otherwise they cannot be easily relocated when loaded.

Should patches 3 and 4 be together (ie, alter the LDFLAGS and the
module.c) and have I got the correct LDLAGS?

Comments and testing welcome. I will push out a new series later with
these in and possibly rebased on 3.11-rc2.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* updates for be8 patch series
  2013-07-22 16:33 Ben Dooks
@ 2013-07-22 16:49 ` Ben Dooks
  0 siblings, 0 replies; 13+ messages in thread
From: Ben Dooks @ 2013-07-22 16:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 22/07/13 17:33, Ben Dooks wrote:
> I've updated the BE8 patch series after running further tests on the
> kernel. I found an issue with the alignment handler (and I believe the
> same issue with the trap handler).
>
> I also ran tests with the module loader, which shows that if we are
> in BE8 that it needs to fix up the relocations. Also the modules need
> to be built --be8 otherwise they cannot be easily relocated when loaded.
>
> Should patches 3 and 4 be together (ie, alter the LDFLAGS and the
> module.c) and have I got the correct LDLAGS?
>
> Comments and testing welcome. I will push out a new series later with
> these in and possibly rebased on 3.11-rc2.

New series based on 3.10 at:

	git://git.baserock.org/delta/linux.git
		baserock/311/be/core-v3
		baserock/311/be/atags-v3

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 13+ messages in thread

* updates for be8 patch series
       [not found] <CAA3XUr3h5C0oJVrN5iXsM=NFkLMz_Tssp28CysZfwMNS5KvO-Q@mail.gmail.com>
@ 2013-07-23 17:40 ` Ben Dooks
  2013-07-23 17:53   ` Victor Kamensky
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Dooks @ 2013-07-23 17:40 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/07/13 18:24, Victor Kamensky wrote:
> Hi Ben,
>
> Wrt BE8 little endian instructions you will need to fix couple more places:
>      ftrace arch/arm/kernel/ftrace.c
>      kprobes arch/arm/kernel/kprobes.c
> Also in big endian mode atomic64_xxx function will need fixes, otherwise
> perf
> counters will be truncated on max of 32bit values
>      atomic64 arch/arm/include/asm/atomic.h

Hmm, thought ftrace already did the right endian-ness conversions,
although it is possible it could have missed one.

The atomic instructions should work fine for BE8, they're using
STRD which should be correct.

> I've attached board independent (core) patch from my work that made
> Pandaboard ES
> kernel run in BE mode. You can check my version of fixes for above
> issues in the
> patch. Look for corresponding files changes. Please rework them
> according to style
> and rules of your patch series. Note the patch itself contains fixes for
> few other
> issues that already fixed in your series. I'll cross check and compare
> individual
> areas latter. I think you may find attached patch interesting as well,
> it was developed
> independent from yours but matches its very well.

>

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 13+ messages in thread

* updates for be8 patch series
  2013-07-23 17:40 ` Ben Dooks
@ 2013-07-23 17:53   ` Victor Kamensky
  2013-07-23 18:02     ` Ben Dooks
  2013-07-23 18:03     ` Victor Kamensky
  0 siblings, 2 replies; 13+ messages in thread
From: Victor Kamensky @ 2013-07-23 17:53 UTC (permalink / raw)
  To: linux-arm-kernel

On 23 July 2013 10:40, Ben Dooks <ben.dooks@codethink.co.uk> wrote:
> On 23/07/13 18:24, Victor Kamensky wrote:
>>
>> Hi Ben,
>>
>> Wrt BE8 little endian instructions you will need to fix couple more
>> places:
>>      ftrace arch/arm/kernel/ftrace.c
>>      kprobes arch/arm/kernel/kprobes.c
>> Also in big endian mode atomic64_xxx function will need fixes, otherwise
>> perf
>> counters will be truncated on max of 32bit values
>>      atomic64 arch/arm/include/asm/atomic.h
>
>
> Hmm, thought ftrace already did the right endian-ness conversions,
> although it is possible it could have missed one.
>
> The atomic instructions should work fine for BE8, they're using
> STRD which should be correct.

atomic64_read and atomic64_set are fine, but atomic64_add,
atomic64_add_return, atomic64_sub, atomic64_sub_return, etc are not OK.
The issue is in arithmetic operations not load/store.

Thanks,
Victor

>
>> I've attached board independent (core) patch from my work that made
>> Pandaboard ES
>> kernel run in BE mode. You can check my version of fixes for above
>> issues in the
>> patch. Look for corresponding files changes. Please rework them
>> according to style
>> and rules of your patch series. Note the patch itself contains fixes for
>> few other
>> issues that already fixed in your series. I'll cross check and compare
>> individual
>> areas latter. I think you may find attached patch interesting as well,
>> it was developed
>> independent from yours but matches its very well.
>
>
>>
>
> --
> Ben Dooks                               http://www.codethink.co.uk/
> Senior Engineer                         Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 13+ messages in thread

* updates for be8 patch series
       [not found] <CAA3XUr1o0cOCybSR=pCUT-WYP4xus5VrES6Hyzo-Wy4tcutTvQ@mail.gmail.com>
@ 2013-07-23 18:02 ` Ben Dooks
  0 siblings, 0 replies; 13+ messages in thread
From: Ben Dooks @ 2013-07-23 18:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/07/13 18:40, Victor Kamensky wrote:
> [resend in plain-text mode]
>
> Hi Ben,
>
> Wrt BE8 little endian instructions you will need to fix couple more places:
>      ftrace arch/arm/kernel/ftrace.c
>      kprobes arch/arm/kernel/kprobes.c
> Also in big endian mode atomic64_xxx function will need fixes, otherwise perf
> counters will be truncated on max of 32bit values
>      atomic64 arch/arm/include/asm/atomic.h
>
> I've attached board independent (core) patch from my work that made
> Pandaboard ES
> kernel run in BE mode. You can check my version of fixes for above issues in the
> patch. Look for corresponding files changes. Please rework them
> according to style
> and rules of your patch series. Note the patch itself contains fixes
> for few other
> issues that already fixed in your series. I'll cross check and compare
> individual
> areas latter. I think you may find attached patch interesting as well,
> it was developed
> independent from yours but matches it very well.
>
> Wrt of testing: ftrace was tested on Pandaboard ES, krprobes changes were tested
> with SystemTap at some point of patch version on Pandaboard ES (not
> thumb mode though),
> also it may deteriorated while ported across different kernel version,
> good idea to
> rested last patch. atomic64 were tested on Pandaboard ES and Marvell Armada XP.

Please wrap your emails to <80 characters, it was really difficult
to read this.

The atomic64 ops is an interesting one, I still think the in-kernel
one is correct. Why are atomic64 being used on 32bit? If you are
trying to decompose a 64bit into two 32bits, then that's probably
the problem.

The relocation code, the R_ARM instruction relocations should only
be instructions and therefore can be safely swapped using the correct
op-code accesses. The R_ARM_ABS32 IIRC is always in the same ordering
as the CPU would access data.

The kprobes stuff I am surprised if it works, due to the fact it calls
patch_text() which already uses <asm/opcodes.h> to do the relevant
byte-swapping. I think you only need to apply swaps to anything
that it loads from memory.

Which version of Linux did you patch?

I think this is the necessary change to kprobes:

diff --git a/arch/arm/kernel/kprobes.c b/arch/arm/kernel/kprobes.c
index 170e9f3..c548356 100644
--- a/arch/arm/kernel/kprobes.c
+++ b/arch/arm/kernel/kprobes.c
@@ -26,6 +26,7 @@
  #include <linux/stop_machine.h>
  #include <linux/stringify.h>
  #include <asm/traps.h>
+#include <asm/opcodes.h>
  #include <asm/cacheflush.h>

  #include "kprobes.h"
@@ -62,10 +63,10 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
  #ifdef CONFIG_THUMB2_KERNEL
         thumb = true;
         addr &= ~1; /* Bit 0 would normally be set to indicate Thumb 
code */
-       insn = ((u16 *)addr)[0];
+       insn = __mem_to_opcode_thumb16(((u16 *)addr)[0]);
         if (is_wide_instruction(insn)) {
                 insn <<= 16;
-               insn |= ((u16 *)addr)[1];
+               insn |= __mem_to_opcode_thumb16(((u16 *)addr)[1]);
                 decode_insn = thumb32_kprobe_decode_insn;
         } else
                 decode_insn = thumb16_kprobe_decode_insn;
@@ -73,7 +74,7 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
         thumb = false;
         if (addr & 0x3)
                 return -EINVAL;
-       insn = *p->addr;
+       insn = __mem_to_opcode_arm(*p->addr);
         decode_insn = arm_kprobe_decode_insn;
  #endif



-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* updates for be8 patch series
  2013-07-23 17:53   ` Victor Kamensky
@ 2013-07-23 18:02     ` Ben Dooks
  2013-07-23 18:03     ` Victor Kamensky
  1 sibling, 0 replies; 13+ messages in thread
From: Ben Dooks @ 2013-07-23 18:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/07/13 18:53, Victor Kamensky wrote:
> On 23 July 2013 10:40, Ben Dooks<ben.dooks@codethink.co.uk>  wrote:
>> On 23/07/13 18:24, Victor Kamensky wrote:
>>>
>>> Hi Ben,
>>>
>>> Wrt BE8 little endian instructions you will need to fix couple more
>>> places:
>>>       ftrace arch/arm/kernel/ftrace.c
>>>       kprobes arch/arm/kernel/kprobes.c
>>> Also in big endian mode atomic64_xxx function will need fixes, otherwise
>>> perf
>>> counters will be truncated on max of 32bit values
>>>       atomic64 arch/arm/include/asm/atomic.h
>>
>>
>> Hmm, thought ftrace already did the right endian-ness conversions,
>> although it is possible it could have missed one.
>>
>> The atomic instructions should work fine for BE8, they're using
>> STRD which should be correct.
>
> atomic64_read and atomic64_set are fine, but atomic64_add,
> atomic64_add_return, atomic64_sub, atomic64_sub_return, etc are not OK.
> The issue is in arithmetic operations not load/store.

I'm still surprised these don't work. What exact situation are you
seeing the failures in? Code would be very helpful here.

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 13+ messages in thread

* updates for be8 patch series
  2013-07-23 17:53   ` Victor Kamensky
  2013-07-23 18:02     ` Ben Dooks
@ 2013-07-23 18:03     ` Victor Kamensky
  2013-07-23 18:30       ` Ben Dooks
  1 sibling, 1 reply; 13+ messages in thread
From: Victor Kamensky @ 2013-07-23 18:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ben,

Sorry for multiple post. I am new to this, and I could not get mailing
list to accept my original email.
For sake of others so people can see discussed patch, here it goes
again with patch file inlined
now. Hope it will this time

Wrt BE8 little endian instructions you will need to fix couple more places:
    ftrace arch/arm/kernel/ftrace.c
    kprobes arch/arm/kernel/kprobes.c
Also in big endian mode atomic64_xxx function will need fixes, otherwise perf
counters will be truncated on max of 32bit values
    atomic64 arch/arm/include/asm/atomic.h

I've attached board independent (core) patch from my work that made
Pandaboard ES
kernel run in BE mode. You can check my version of fixes for above issues in the
patch. Look for corresponding files changes. Please rework them
according to style
and rules of your patch series. Note the patch itself contains fixes
for few other
issues that already fixed in your series. I'll cross check and compare
individual
areas latter. I think you may find attached patch interesting as well,
it was developed
independent from yours but matches its very well.

Wrt of testing: ftrace was tested on Pandaboard ES, krprobes changes were tested
with SystemTap at some point of patch version on Pandaboard ES (not
thumb mode though),
also it may deteriorated while ported across different kernel version,
good idea to
rested last patch. atomic64 were tested on Pandaboard ES and Marvell Armada XP.

Thanks,
Victor

Index: linux-linaro-tracking/arch/arm/Makefile
===================================================================
--- linux-linaro-tracking.orig/arch/arm/Makefile
+++ linux-linaro-tracking/arch/arm/Makefile
@@ -16,6 +16,7 @@ LDFLAGS        :=
 LDFLAGS_vmlinux    :=-p --no-undefined -X
 ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
 LDFLAGS_vmlinux    += --be8
+KBUILD_LDFLAGS_MODULE += --be8
 endif

 OBJCOPYFLAGS    :=-O binary -R .comment -S
@@ -43,12 +44,19 @@ ifeq ($(CONFIG_CC_STACKPROTECTOR),y)
 KBUILD_CFLAGS    +=-fstack-protector
 endif

+# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are
+# places where CFLAGS alone are used (cmd_record_mcount). And it has to
+# be in CPPFLAGS because it affects what macros (__ARMEB__, __BYTE_ORDER__,
+# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be
+# good compromise
 ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
 KBUILD_CPPFLAGS    += -mbig-endian
+KBUILD_CFLAGS    += -mbig-endian
 AS        += -EB
 LD        += -EB
 else
 KBUILD_CPPFLAGS    += -mlittle-endian
+KBUILD_CFLAGS    += -mlittle-endian
 AS        += -EL
 LD        += -EL
 endif
Index: linux-linaro-tracking/arch/arm/boot/compressed/Makefile
===================================================================
--- linux-linaro-tracking.orig/arch/arm/boot/compressed/Makefile
+++ linux-linaro-tracking/arch/arm/boot/compressed/Makefile
@@ -138,6 +138,24 @@ endif
 ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
 LDFLAGS_vmlinux += --be8
 endif
+
+# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are
+# places where CFLAGS alone are used (cmd_record_mcount). And it has to
+# be in CPPFLAGS because it affects what macros (__ARMEB__, __BYTE_ORDER__,
+# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be
+# good compromise
+ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
+KBUILD_CPPFLAGS    += -mbig-endian
+KBUILD_CFLAGS    += -mbig-endian
+AS        += -EB
+LD        += -EB
+else
+KBUILD_CPPFLAGS    += -mlittle-endian
+KBUILD_CFLAGS    += -mlittle-endian
+AS        += -EL
+LD        += -EL
+endif
+
 # ?
 LDFLAGS_vmlinux += -p
 # Report unresolved symbol references
Index: linux-linaro-tracking/arch/arm/boot/compressed/head.S
===================================================================
--- linux-linaro-tracking.orig/arch/arm/boot/compressed/head.S
+++ linux-linaro-tracking/arch/arm/boot/compressed/head.S
@@ -141,6 +141,12 @@ start:
 #endif
         mov    r7, r1            @ save architecture ID
         mov    r8, r2            @ save atags pointer
+#ifdef CONFIG_LITTLE_ENDIAN_LOADER
+                /*
+                 * Switch to bigendian mode
+                 */
+                setend  be
+#endif /* CONFIG_LITTLE_ENDIAN_LOADER */

 #ifndef __ARM_ARCH_2__
         /*
Index: linux-linaro-tracking/arch/arm/include/asm/setup.h
===================================================================
--- linux-linaro-tracking.orig/arch/arm/include/asm/setup.h
+++ linux-linaro-tracking/arch/arm/include/asm/setup.h
@@ -53,4 +53,8 @@ extern int arm_add_memory(phys_addr_t st
 extern void early_print(const char *str, ...);
 extern void dump_machine_table(void);

+#ifdef CONFIG_LITTLE_ENDIAN_LOADER
+void byteswap_tags(struct tag *t);
+#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
+
 #endif
Index: linux-linaro-tracking/arch/arm/kernel/Makefile
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/Makefile
+++ linux-linaro-tracking/arch/arm/kernel/Makefile
@@ -22,6 +22,7 @@ obj-y        := elf.o entry-armv.o entry-commo
 obj-$(CONFIG_ATAGS)        += atags_parse.o
 obj-$(CONFIG_ATAGS_PROC)    += atags_proc.o
 obj-$(CONFIG_DEPRECATED_PARAM_STRUCT) += atags_compat.o
+obj-$(CONFIG_LITTLE_ENDIAN_LOADER) += atags_byteswap.o

 obj-$(CONFIG_OC_ETM)        += etm.o
 obj-$(CONFIG_CPU_IDLE)        += cpuidle.o
Index: linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c
===================================================================
--- /dev/null
+++ linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c
@@ -0,0 +1,119 @@
+#include <linux/slab.h>
+#include <linux/proc_fs.h>
+#include <linux/swab.h>
+#include <asm/setup.h>
+#include <asm/types.h>
+#include <asm/page.h>
+
+#define byteswap32(x) (x) = __swab32((x))
+#define byteswap16(x) (x) = __swab16((x))
+
+static void __init byteswap_tag_core(struct tag *tag)
+{
+  byteswap32(tag->u.core.flags);
+  byteswap32(tag->u.core.pagesize);
+  byteswap32(tag->u.core.rootdev);
+}
+
+static void __init byteswap_tag_mem32(struct tag *tag)
+{
+  byteswap32(tag->u.mem.size);
+  byteswap32(tag->u.mem.start);
+}
+
+static void __init byteswap_tag_videotext(struct tag *tag)
+{
+  byteswap16(tag->u.videotext.video_page);
+  byteswap16(tag->u.videotext.video_ega_bx);
+  byteswap16(tag->u.videotext.video_points);
+}
+
+static void __init byteswap_tag_ramdisk(struct tag *tag)
+{
+  byteswap32(tag->u.ramdisk.flags);
+  byteswap32(tag->u.ramdisk.size);
+  byteswap32(tag->u.ramdisk.start);
+}
+
+static void __init byteswap_tag_initrd(struct tag *tag)
+{
+  byteswap32(tag->u.initrd.start);
+  byteswap32(tag->u.initrd.size);
+}
+
+static void __init byteswap_tag_serialnr(struct tag *tag)
+{
+  byteswap32(tag->u.serialnr.low);
+  byteswap32(tag->u.serialnr.high);
+}
+
+static void __init byteswap_tag_revision(struct tag *tag)
+{
+    byteswap32(tag->u.revision.rev);
+}
+
+static void __init byteswap_tag_videolfb(struct tag *tag)
+{
+    byteswap16(tag->u.videolfb.lfb_width);
+    byteswap16(tag->u.videolfb.lfb_height);
+    byteswap16(tag->u.videolfb.lfb_depth);
+    byteswap16(tag->u.videolfb.lfb_linelength);
+    byteswap32(tag->u.videolfb.lfb_base);
+    byteswap32(tag->u.videolfb.lfb_size);
+}
+
+static void __init byteswap_tag_acorn(struct tag *tag)
+{
+    byteswap32(tag->u.acorn.memc_control_reg);
+    byteswap32(tag->u.acorn.vram_pages);
+}
+
+static void __init byteswap_tag_memclk(struct tag *tag)
+{
+    byteswap32(tag->u.memclk.fmemclk);
+}
+
+void __init byteswap_tags(struct tag *t)
+{
+  for (; t->hdr.size; t = tag_next(t)) {
+    byteswap32(t->hdr.size);
+    byteswap32(t->hdr.tag);
+    switch(t->hdr.tag) {
+    case ATAG_CORE:
+      byteswap_tag_core(t);
+      break;
+    case ATAG_MEM:
+      byteswap_tag_mem32(t);
+      break;
+    case ATAG_VIDEOTEXT:
+      byteswap_tag_videotext(t);
+      break;
+    case ATAG_RAMDISK:
+      byteswap_tag_ramdisk(t);
+      break;
+    case ATAG_INITRD2:
+      byteswap_tag_initrd(t);
+      break;
+    case ATAG_SERIAL:
+      byteswap_tag_serialnr(t);
+      break;
+    case ATAG_REVISION:
+      byteswap_tag_revision(t);
+      break;
+    case ATAG_VIDEOLFB:
+      byteswap_tag_videolfb(t);
+      break;
+    case ATAG_CMDLINE:
+      break;
+    case ATAG_ACORN:
+      byteswap_tag_acorn(t);
+      break;
+    case ATAG_MEMCLK:
+      byteswap_tag_memclk(t);
+      break;
+    default:
+      /* Need to complain? */
+      break;
+    }
+  }
+}
Index: linux-linaro-tracking/arch/arm/kernel/head-common.S
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/head-common.S
+++ linux-linaro-tracking/arch/arm/kernel/head-common.S
@@ -48,6 +48,9 @@ __vet_atags:
     bne    1f

     ldr    r5, [r2, #0]
+#ifdef CONFIG_LITTLE_ENDIAN_LOADER
+        rev     r5, r5
+#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
 #ifdef CONFIG_OF_FLATTREE
     ldr    r6, =OF_DT_MAGIC        @ is it a DTB?
     cmp    r5, r6
@@ -57,6 +60,9 @@ __vet_atags:
     cmpne    r5, #ATAG_CORE_SIZE_EMPTY
     bne    1f
     ldr    r5, [r2, #4]
+#ifdef CONFIG_LITTLE_ENDIAN_LOADER
+        rev     r5, r5
+#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
     ldr    r6, =ATAG_CORE
     cmp    r5, r6
     bne    1f
Index: linux-linaro-tracking/arch/arm/kernel/module.c
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/module.c
+++ linux-linaro-tracking/arch/arm/kernel/module.c
@@ -45,6 +45,65 @@ void *module_alloc(unsigned long size)
 }
 #endif

+/*
+ * For relocations in exectuable sections if we configured with
+ * CONFIG_CPU_ENDIAN_BE8 we assume that code is in little endian byteswap
+ * form already and we need to byteswap value when we read and when we
+ * write. If relocaton R_ARM_ABS32 type we assume that it is relocation for
+ * data piece and we do not perform byteswaps when such relocation applied.
+ * This herustic based approach may break in future.
+ *
+ * It seems that more correct way to handle be8 in kernel module is
+ * to build module without --be8 option, but perform instruction byteswap
+ * when module sections are loaded, after all relocation are applied.
+ * In order to do such byteswap we need to read all mappings symbols ($a,
+ * $d, $t), sort them, and apply byteswaps for $a and $t entries.
+ */
+static inline u32 read_section_u32(Elf32_Shdr *dstsec, unsigned long loc)
+{
+       u32 retval = *(u32 *)loc;
+
+#ifdef CONFIG_CPU_ENDIAN_BE8
+        if (dstsec->sh_flags & SHF_EXECINSTR) {
+                retval = __swab32(retval);
+        }
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+        return retval;
+}
+
+static inline void write_section_u32(Elf32_Shdr *dstsec, unsigned
long loc, u32 value)
+{
+#ifdef CONFIG_CPU_ENDIAN_BE8
+        if (dstsec->sh_flags & SHF_EXECINSTR) {
+                value = __swab32(value);
+        }
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+        *(u32 *) loc = value;
+}
+
+#ifdef CONFIG_THUMB2_KERNEL
+static inline u16 read_section_u16(Elf32_Shdr *dstsec, unsigned long loc)
+{
+        u16 retval = *(u16 *) loc;
+#ifdef CONFIG_CPU_ENDIAN_BE8
+        if (dstsec->sh_flags & SHF_EXECINSTR) {
+                retval = __swab16(retval);
+        }
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+        return retval;
+}
+
+static inline void write_section_u16(Elf32_Shdr *dstsec, unsigned
long loc, u16 value)
+{
+#ifdef CONFIG_CPU_ENDIAN_BE8
+        if (dstsec->sh_flags & SHF_EXECINSTR) {
+                value = __swab16(value);
+        }
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+        *(u16 *) loc = value;
+}
+#endif /* CONFIG_THUMB2_KERNEL */
+
 int
 apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
            unsigned int relindex, struct module *module)
@@ -60,6 +119,7 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
         Elf32_Sym *sym;
         const char *symname;
         s32 offset;
+                u32 loc_u32;
 #ifdef CONFIG_THUMB2_KERNEL
         u32 upper, lower, sign, j1, j2;
 #endif
@@ -95,7 +155,8 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
         case R_ARM_PC24:
         case R_ARM_CALL:
         case R_ARM_JUMP24:
-            offset = (*(u32 *)loc & 0x00ffffff) << 2;
+                        loc_u32 = read_section_u32(dstsec, loc);
+            offset = (loc_u32 & 0x00ffffff) << 2;
             if (offset & 0x02000000)
                 offset -= 0x04000000;

@@ -112,27 +173,33 @@ apply_relocate(Elf32_Shdr *sechdrs, cons

             offset >>= 2;

-            *(u32 *)loc &= 0xff000000;
-            *(u32 *)loc |= offset & 0x00ffffff;
+            loc_u32 &= 0xff000000;
+            loc_u32 |= offset & 0x00ffffff;
+                        write_section_u32(dstsec, loc, loc_u32);
             break;

-           case R_ARM_V4BX:
+                case R_ARM_V4BX:
+                        loc_u32 = read_section_u32(dstsec, loc);
                /* Preserve Rm and the condition code. Alter
             * other bits to re-code instruction as
             * MOV PC,Rm.
             */
-               *(u32 *)loc &= 0xf000000f;
-               *(u32 *)loc |= 0x01a0f000;
+               loc_u32 &= 0xf000000f;
+               loc_u32 |= 0x01a0f000;
+                       write_section_u32(dstsec, loc, loc_u32);
                break;

         case R_ARM_PREL31:
-            offset = *(u32 *)loc + sym->st_value - loc;
-            *(u32 *)loc = offset & 0x7fffffff;
+                        loc_u32 = read_section_u32(dstsec, loc);
+            offset = loc_u32 + sym->st_value - loc;
+            loc_u32 = offset & 0x7fffffff;
+                        write_section_u32(dstsec, loc, loc_u32);
             break;

         case R_ARM_MOVW_ABS_NC:
         case R_ARM_MOVT_ABS:
-            offset = *(u32 *)loc;
+                        loc_u32 = read_section_u32(dstsec, loc);
+            offset = loc_u32;
             offset = ((offset & 0xf0000) >> 4) | (offset & 0xfff);
             offset = (offset ^ 0x8000) - 0x8000;

@@ -140,16 +207,17 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
             if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS)
                 offset >>= 16;

-            *(u32 *)loc &= 0xfff0f000;
-            *(u32 *)loc |= ((offset & 0xf000) << 4) |
-                    (offset & 0x0fff);
+            loc_u32 &= 0xfff0f000;
+            loc_u32 |= ((offset & 0xf000) << 4) |
+                                   (offset & 0x0fff);
+                        write_section_u32(dstsec, loc, loc_u32);
             break;

 #ifdef CONFIG_THUMB2_KERNEL
         case R_ARM_THM_CALL:
         case R_ARM_THM_JUMP24:
-            upper = *(u16 *)loc;
-            lower = *(u16 *)(loc + 2);
+                        upper = read_section_u16(dstsec, loc);
+            lower = read_section_u16(dstsec, loc + 2);

             /*
              * 25 bit signed address range (Thumb-2 BL and B.W
@@ -198,17 +266,19 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
             sign = (offset >> 24) & 1;
             j1 = sign ^ (~(offset >> 23) & 1);
             j2 = sign ^ (~(offset >> 22) & 1);
-            *(u16 *)loc = (u16)((upper & 0xf800) | (sign << 10) |
-                        ((offset >> 12) & 0x03ff));
-            *(u16 *)(loc + 2) = (u16)((lower & 0xd000) |
-                          (j1 << 13) | (j2 << 11) |
-                          ((offset >> 1) & 0x07ff));
+                        write_section_u16(dstsec, loc,
+                                          (u16)((upper & 0xf800) |
(sign << 10) |
+                                                ((offset >> 12) & 0x03ff)));
+                        write_section_u16(dstsec, loc + 2,
+                                          (u16)((lower & 0xd000) |
+                                                (j1 << 13) | (j2 << 11) |
+                                                ((offset >> 1) & 0x07ff)));
             break;

         case R_ARM_THM_MOVW_ABS_NC:
         case R_ARM_THM_MOVT_ABS:
-            upper = *(u16 *)loc;
-            lower = *(u16 *)(loc + 2);
+                        upper = read_section_u16(dstsec, loc);
+            lower = read_section_u16(dstsec, loc + 2);

             /*
              * MOVT/MOVW instructions encoding in Thumb-2:
@@ -229,12 +299,14 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
             if (ELF32_R_TYPE(rel->r_info) == R_ARM_THM_MOVT_ABS)
                 offset >>= 16;

-            *(u16 *)loc = (u16)((upper & 0xfbf0) |
-                        ((offset & 0xf000) >> 12) |
-                        ((offset & 0x0800) >> 1));
-            *(u16 *)(loc + 2) = (u16)((lower & 0x8f00) |
-                          ((offset & 0x0700) << 4) |
-                          (offset & 0x00ff));
+                        write_section_u16(dstsec, loc,
+                                          (u16)((upper & 0xfbf0) |
+                                                ((offset & 0xf000) >> 12) |
+                                                ((offset & 0x0800) >> 1)));
+                        write_section_u16(dstsec, loc + 2,
+                                          (u16)((lower & 0x8f00) |
+                                                ((offset & 0x0700) << 4) |
+                                                (offset & 0x00ff)));
             break;
 #endif

Index: linux-linaro-tracking/arch/arm/mm/Kconfig
===================================================================
--- linux-linaro-tracking.orig/arch/arm/mm/Kconfig
+++ linux-linaro-tracking/arch/arm/mm/Kconfig
@@ -695,6 +695,15 @@ config CPU_ENDIAN_BE32
     help
       Support for the BE-32 (big-endian) mode on pre-ARMv6 processors.

+config LITTLE_ENDIAN_LOADER
+        bool
+        depends on CPU_BIG_ENDIAN
+        default y
+        help
+          If Linux should operate in big-endian mode, but bootloader
(i.e u-boot)
+          is still in little endian mode and passes boot information in little
+          endian form set this flag
+
 config CPU_HIGH_VECTOR
     depends on !MMU && CPU_CP15 && !CPU_ARM740T
     bool "Select the High exception vector"
Index: linux-linaro-tracking/arch/arm/mm/alignment.c
===================================================================
--- linux-linaro-tracking.orig/arch/arm/mm/alignment.c
+++ linux-linaro-tracking/arch/arm/mm/alignment.c
@@ -792,6 +792,10 @@ do_alignment(unsigned long addr, unsigne

     regs->ARM_pc += isize;

+#ifdef CONFIG_CPU_ENDIAN_BE8
+        instr = __swab32(instr); /* little endian instruction */
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+
     switch (CODING_BITS(instr)) {
     case 0x00000000:    /* 3.13.4 load/store instruction extensions */
         if (LDSTHD_I_BIT(instr))
Index: linux-linaro-tracking/arch/arm/kernel/head.S
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/head.S
+++ linux-linaro-tracking/arch/arm/kernel/head.S
@@ -89,6 +89,12 @@ ENTRY(stext)
     @ ensure svc mode and all interrupts masked
     safe_svcmode_maskall r9

+#ifdef CONFIG_LITTLE_ENDIAN_LOADER
+    /*
+    * Switch to bigendian mode
+    */
+    setend    be
+#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
     mrc    p15, 0, r9, c0, c0        @ get processor id
     bl    __lookup_processor_type        @ r5=procinfo r9=cpuid
     movs    r10, r5                @ invalid processor (r5=0)?
@@ -356,6 +362,12 @@ ENTRY(secondary_startup)
 #endif
     safe_svcmode_maskall r9

+#ifdef CONFIG_LITTLE_ENDIAN_LOADER
+    /*
+    * Switch to bigendian mode
+    */
+    setend    be
+#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
     mrc    p15, 0, r9, c0, c0        @ get processor id
     bl    __lookup_processor_type
     movs    r10, r5                @ invalid processor?
@@ -427,6 +439,9 @@ __enable_mmu:
 #ifdef CONFIG_CPU_ICACHE_DISABLE
     bic    r0, r0, #CR_I
 #endif
+#ifdef CONFIG_CPU_ENDIAN_BE8
+    orr    r0, r0, #CR_EE    @ big-endian page tables
+#endif
 #ifdef CONFIG_ARM_LPAE
     mov    r5, #0
     mcrr    p15, 0, r4, r5, c2        @ load TTBR0
@@ -592,8 +607,14 @@ __fixup_a_pv_table:
     b    2f
 1:    add     r7, r3
     ldrh    ip, [r7, #2]
+#ifdef __ARMEB__
+        rev     ip, ip   @ byteswap instruction
+#endif /* __ARMEB__ */
     and    ip, 0x8f00
     orr    ip, r6    @ mask in offset bits 31-24
+#ifdef __ARMEB__
+        rev     ip, ip   @ byteswap instruction
+#endif /* __ARMEB__ */
     strh    ip, [r7, #2]
 2:    cmp    r4, r5
     ldrcc    r7, [r4], #4    @ use branch for delay slot
@@ -602,8 +623,14 @@ __fixup_a_pv_table:
 #else
     b    2f
 1:    ldr    ip, [r7, r3]
+#ifdef __ARMEB__
+        rev     ip, ip   @ byteswap instruction
+#endif /* __ARMEB__ */
     bic    ip, ip, #0x000000ff
     orr    ip, ip, r6    @ mask in offset bits 31-24
+#ifdef __ARMEB__
+        rev     ip, ip   @ byteswap instruction
+#endif /* __ARMEB__ */
     str    ip, [r7, r3]
 2:    cmp    r4, r5
     ldrcc    r7, [r4], #4    @ use branch for delay slot
Index: linux-linaro-tracking/arch/arm/kernel/kprobes.c
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/kprobes.c
+++ linux-linaro-tracking/arch/arm/kernel/kprobes.c
@@ -66,14 +66,24 @@ int __kprobes arch_prepare_kprobe(struct
     if (is_wide_instruction(insn)) {
         insn <<= 16;
         insn |= ((u16 *)addr)[1];
+#ifdef CONFIG_CPU_ENDIAN_BE8
+                insn = __swab32(insn); /* little endian instruction */
+#endif
         decode_insn = thumb32_kprobe_decode_insn;
-    } else
-        decode_insn = thumb16_kprobe_decode_insn;
+    } else {
+                decode_insn = thumb16_kprobe_decode_insn;
+#ifdef CONFIG_CPU_ENDIAN_BE8
+                insn = __swab16(insn); /* little endian instruction */
+#endif
+        }
 #else /* !CONFIG_THUMB2_KERNEL */
     thumb = false;
     if (addr & 0x3)
         return -EINVAL;
     insn = *p->addr;
+#ifdef CONFIG_CPU_ENDIAN_BE8
+        insn = __swab32(insn); /* little endian instruction */
+#endif
     decode_insn = arm_kprobe_decode_insn;
 #endif

@@ -89,7 +99,11 @@ int __kprobes arch_prepare_kprobe(struct
         if (!p->ainsn.insn)
             return -ENOMEM;
         for (is = 0; is < MAX_INSN_SIZE; ++is)
+#ifndef CONFIG_CPU_ENDIAN_BE8
             p->ainsn.insn[is] = tmp_insn[is];
+#else
+                        p->ainsn.insn[is] = __swab32(tmp_insn[is]);
/* little endian instruction */
+#endif
         flush_insns(p->ainsn.insn,
                 sizeof(p->ainsn.insn[0]) * MAX_INSN_SIZE);
         p->ainsn.insn_fn = (kprobe_insn_fn_t *)
@@ -113,10 +127,17 @@ void __kprobes arch_arm_kprobe(struct kp
         /* Remove any Thumb flag */
         addr = (void *)((uintptr_t)p->addr & ~1);

-        if (is_wide_instruction(p->opcode))
+        if (is_wide_instruction(p->opcode)) {
             brkp = KPROBE_THUMB32_BREAKPOINT_INSTRUCTION;
-        else
+#ifdef CONFIG_CPU_ENDIAN_BE8
+                brkp = __swab32(brkp);
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+        } else {
             brkp = KPROBE_THUMB16_BREAKPOINT_INSTRUCTION;
+#ifdef CONFIG_CPU_ENDIAN_BE8
+                brkp = __swab16(brkp);
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+            }
     } else {
         kprobe_opcode_t insn = p->opcode;

@@ -127,6 +148,10 @@ void __kprobes arch_arm_kprobe(struct kp
             brkp |= 0xe0000000;  /* Unconditional instruction */
         else
             brkp |= insn & 0xf0000000;  /* Copy condition from insn */
+#ifdef CONFIG_CPU_ENDIAN_BE8
+        brkp = __swab32(brkp);
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+
     }

     patch_text(addr, brkp);
@@ -144,8 +169,12 @@ int __kprobes __arch_disarm_kprobe(void
 {
     struct kprobe *kp = p;
     void *addr = (void *)((uintptr_t)kp->addr & ~1);
+    kprobe_opcode_t insn = kp->opcode;

-    __patch_text(addr, kp->opcode);
+#ifdef CONFIG_CPU_ENDIAN_BE8
+        insn = __swab32(insn); /* little endian instruction */
+#endif
+    __patch_text(addr, insn);

     return 0;
 }
Index: linux-linaro-tracking/arch/arm/kernel/traps.c
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/traps.c
+++ linux-linaro-tracking/arch/arm/kernel/traps.c
@@ -426,6 +426,10 @@ asmlinkage void __exception do_undefinst
         goto die_sig;
     }

+#ifdef CONFIG_CPU_ENDIAN_BE8
+        instr = __swab32(instr); /* little endian instruction */
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+
     if (call_undef_hook(regs, instr) == 0)
         return;

Index: linux-linaro-tracking/arch/arm/kernel/ftrace.c
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/ftrace.c
+++ linux-linaro-tracking/arch/arm/kernel/ftrace.c
@@ -100,10 +100,18 @@ static int ftrace_modify_code(unsigned l
         if (probe_kernel_read(&replaced, (void *)pc, MCOUNT_INSN_SIZE))
             return -EFAULT;

+#ifdef CONFIG_CPU_ENDIAN_BE8
+        replaced = __swab32(replaced); /* little endian instruction */
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+
         if (replaced != old)
             return -EINVAL;
     }

+#ifdef CONFIG_CPU_ENDIAN_BE8
+        new = __swab32(new); /* little endian instruction */
+#endif /* CONFIG_CPU_ENDIAN_BE8 */
+
     if (probe_kernel_write((void *)pc, &new, MCOUNT_INSN_SIZE))
         return -EPERM;

Index: linux-linaro-tracking/arch/arm/kernel/atags_parse.c
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/atags_parse.c
+++ linux-linaro-tracking/arch/arm/kernel/atags_parse.c
@@ -216,6 +216,10 @@ struct machine_desc * __init setup_machi
     if (tags->hdr.tag != ATAG_CORE)
         convert_to_tag_list(tags);
 #endif
+#ifdef CONFIG_LITTLE_ENDIAN_LOADER
+    byteswap_tags(tags);
+#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
+
     if (tags->hdr.tag != ATAG_CORE) {
         early_print("Warning: Neither atags nor dtb found\n");
         tags = (struct tag *)&default_tags;
Index: linux-linaro-tracking/arch/arm/kernel/sleep.S
===================================================================
--- linux-linaro-tracking.orig/arch/arm/kernel/sleep.S
+++ linux-linaro-tracking/arch/arm/kernel/sleep.S
@@ -81,6 +81,13 @@ ENDPROC(cpu_resume_after_mmu)
     .data
     .align
 ENTRY(cpu_resume)
+#ifdef CONFIG_LITTLE_ENDIAN_LOADER
+        /*
+         * Switch to bigendian mode.
+         * Platform resume code most likely already did that, but just in case
+         */
+        setend  be
+#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
 #ifdef CONFIG_SMP
     mov    r1, #0            @ fall-back logical index for UP
     ALT_SMP(mrc p15, 0, r0, c0, c0, 5)
Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h
===================================================================
--- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h
+++ linux-linaro-tracking/arch/arm/include/asm/atomic.h
@@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a

     __asm__ __volatile__("@ atomic64_add\n"
 "1:    ldrexd    %0, %H0, [%3]\n"
+#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
 "    adds    %0, %0, %4\n"
 "    adc    %H0, %H0, %H4\n"
+#else
+"    adds    %H0, %H0, %H4\n"
+"    adc    %0, %0, %4\n"
+#endif
 "    strexd    %1, %0, %H0, [%3]\n"
 "    teq    %1, #0\n"
 "    bne    1b"
@@ -320,8 +325,13 @@ static inline u64 atomic64_add_return(u6

     __asm__ __volatile__("@ atomic64_add_return\n"
 "1:    ldrexd    %0, %H0, [%3]\n"
+#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
 "    adds    %0, %0, %4\n"
 "    adc    %H0, %H0, %H4\n"
+#else
+"    adds    %H0, %H0, %H4\n"
+"    adc    %0, %0, %4\n"
+#endif
 "    strexd    %1, %0, %H0, [%3]\n"
 "    teq    %1, #0\n"
 "    bne    1b"
@@ -341,8 +351,13 @@ static inline void atomic64_sub(u64 i, a

     __asm__ __volatile__("@ atomic64_sub\n"
 "1:    ldrexd    %0, %H0, [%3]\n"
+#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
 "    subs    %0, %0, %4\n"
 "    sbc    %H0, %H0, %H4\n"
+#else
+"    subs    %H0, %H0, %H4\n"
+"    sbc    %0, %0, %4\n"
+#endif
 "    strexd    %1, %0, %H0, [%3]\n"
 "    teq    %1, #0\n"
 "    bne    1b"
@@ -360,8 +375,13 @@ static inline u64 atomic64_sub_return(u6

     __asm__ __volatile__("@ atomic64_sub_return\n"
 "1:    ldrexd    %0, %H0, [%3]\n"
+#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
 "    subs    %0, %0, %4\n"
 "    sbc    %H0, %H0, %H4\n"
+#else
+"    subs    %H0, %H0, %H4\n"
+"    sbc    %0, %0, %4\n"
+#endif
 "    strexd    %1, %0, %H0, [%3]\n"
 "    teq    %1, #0\n"
 "    bne    1b"
@@ -428,9 +448,15 @@ static inline u64 atomic64_dec_if_positi

     __asm__ __volatile__("@ atomic64_dec_if_positive\n"
 "1:    ldrexd    %0, %H0, [%3]\n"
+#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
 "    subs    %0, %0, #1\n"
 "    sbc    %H0, %H0, #0\n"
 "    teq    %H0, #0\n"
+#else
+"    subs    %H0, %H0, #1\n"
+"    sbc    %0, %0, #0\n"
+"    teq    %0, #0\n"
+#endif
 "    bmi    2f\n"
 "    strexd    %1, %0, %H0, [%3]\n"
 "    teq    %1, #0\n"
@@ -459,8 +485,13 @@ static inline int atomic64_add_unless(at
 "    teqeq    %H0, %H5\n"
 "    moveq    %1, #0\n"
 "    beq    2f\n"
+#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
 "    adds    %0, %0, %6\n"
 "    adc    %H0, %H0, %H6\n"
+#else
+"    adds    %H0, %H0, %H6\n"
+"    adc    %0, %0, %6\n"
+#endif
 "    strexd    %2, %0, %H0, [%4]\n"
 "    teq    %2, #0\n"
 "    bne    1b\n"


On 23 July 2013 10:53, Victor Kamensky <victor.kamensky@linaro.org> wrote:
> On 23 July 2013 10:40, Ben Dooks <ben.dooks@codethink.co.uk> wrote:
>> On 23/07/13 18:24, Victor Kamensky wrote:
>>>
>>> Hi Ben,
>>>
>>> Wrt BE8 little endian instructions you will need to fix couple more
>>> places:
>>>      ftrace arch/arm/kernel/ftrace.c
>>>      kprobes arch/arm/kernel/kprobes.c
>>> Also in big endian mode atomic64_xxx function will need fixes, otherwise
>>> perf
>>> counters will be truncated on max of 32bit values
>>>      atomic64 arch/arm/include/asm/atomic.h
>>
>>
>> Hmm, thought ftrace already did the right endian-ness conversions,
>> although it is possible it could have missed one.
>>
>> The atomic instructions should work fine for BE8, they're using
>> STRD which should be correct.
>
> atomic64_read and atomic64_set are fine, but atomic64_add,
> atomic64_add_return, atomic64_sub, atomic64_sub_return, etc are not OK.
> The issue is in arithmetic operations not load/store.
>
> Thanks,
> Victor
>
>>
>>> I've attached board independent (core) patch from my work that made
>>> Pandaboard ES
>>> kernel run in BE mode. You can check my version of fixes for above
>>> issues in the
>>> patch. Look for corresponding files changes. Please rework them
>>> according to style
>>> and rules of your patch series. Note the patch itself contains fixes for
>>> few other
>>> issues that already fixed in your series. I'll cross check and compare
>>> individual
>>> areas latter. I think you may find attached patch interesting as well,
>>> it was developed
>>> independent from yours but matches its very well.
>>
>>
>>>
>>
>> --
>> Ben Dooks                               http://www.codethink.co.uk/
>> Senior Engineer                         Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 13+ messages in thread

* updates for be8 patch series
  2013-07-23 18:03     ` Victor Kamensky
@ 2013-07-23 18:30       ` Ben Dooks
  2013-07-24  1:06         ` Victor Kamensky
  2013-07-24  7:31         ` Victor Kamensky
  0 siblings, 2 replies; 13+ messages in thread
From: Ben Dooks @ 2013-07-23 18:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/07/13 19:03, Victor Kamensky wrote:
> Hi Ben,
>
> Sorry for multiple post. I am new to this, and I could not get mailing
> list to accept my original email.
> For sake of others so people can see discussed patch, here it goes
> again with patch file inlined
> now. Hope it will this time
>
> Wrt BE8 little endian instructions you will need to fix couple more places:
>      ftrace arch/arm/kernel/ftrace.c
>      kprobes arch/arm/kernel/kprobes.c
> Also in big endian mode atomic64_xxx function will need fixes, otherwise perf
> counters will be truncated on max of 32bit values
>      atomic64 arch/arm/include/asm/atomic.h
>
> I've attached board independent (core) patch from my work that made
> Pandaboard ES
> kernel run in BE mode. You can check my version of fixes for above issues in the
> patch. Look for corresponding files changes. Please rework them
> according to style
> and rules of your patch series. Note the patch itself contains fixes
> for few other
> issues that already fixed in your series. I'll cross check and compare
> individual
> areas latter. I think you may find attached patch interesting as well,
> it was developed
> independent from yours but matches its very well.
>
> Wrt of testing: ftrace was tested on Pandaboard ES, krprobes changes were tested
> with SystemTap at some point of patch version on Pandaboard ES (not
> thumb mode though),
> also it may deteriorated while ported across different kernel version,
> good idea to
> rested last patch. atomic64 were tested on Pandaboard ES and Marvell Armada XP.
>
> Thanks,
> Victor

First notes, please always format your emails (where possible) to
under 80 characters per line so that they do not get mangled when
viewed on limited screens (for example, I often read email via a
24x80 terminal)

Second note, when producing patches, it is better to try and split
them into a series of patches to reach a goal than just one large
patch. It keeps a lot of useful information (see patch comments)
and it means that if there is a problem, then bits can be lifted
out to check.


> Index: linux-linaro-tracking/arch/arm/Makefile
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/Makefile
> +++ linux-linaro-tracking/arch/arm/Makefile
> @@ -16,6 +16,7 @@ LDFLAGS        :=
>   LDFLAGS_vmlinux    :=-p --no-undefined -X
>   ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
>   LDFLAGS_vmlinux    += --be8
> +KBUILD_LDFLAGS_MODULE += --be8
>   endif
>
>   OBJCOPYFLAGS    :=-O binary -R .comment -S
> @@ -43,12 +44,19 @@ ifeq ($(CONFIG_CC_STACKPROTECTOR),y)
>   KBUILD_CFLAGS    +=-fstack-protector
>   endif
>
> +# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are
> +# places where CFLAGS alone are used (cmd_record_mcount). And it has to
> +# be in CPPFLAGS because it affects what macros (__ARMEB__, __BYTE_ORDER__,
> +# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be
> +# good compromise
>   ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
>   KBUILD_CPPFLAGS    += -mbig-endian
> +KBUILD_CFLAGS    += -mbig-endian
>   AS        += -EB
>   LD        += -EB
>   else
>   KBUILD_CPPFLAGS    += -mlittle-endian
> +KBUILD_CFLAGS    += -mlittle-endian
>   AS        += -EL
>   LD        += -EL
>   endif

See the comment below.

> Index: linux-linaro-tracking/arch/arm/boot/compressed/Makefile
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/boot/compressed/Makefile
> +++ linux-linaro-tracking/arch/arm/boot/compressed/Makefile
> @@ -138,6 +138,24 @@ endif
>   ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
>   LDFLAGS_vmlinux += --be8
>   endif
> +
> +# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are
> +# places where CFLAGS alone are used (cmd_record_mcount). And it has to
> +# be in CPPFLAGS because it affects what macros (__ARMEB__, __BYTE_ORDER__,
> +# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be
> +# good compromise
> +ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
> +KBUILD_CPPFLAGS    += -mbig-endian
> +KBUILD_CFLAGS    += -mbig-endian
> +AS        += -EB
> +LD        += -EB
> +else
> +KBUILD_CPPFLAGS    += -mlittle-endian
> +KBUILD_CFLAGS    += -mlittle-endian
> +AS        += -EL
> +LD        += -EL
> +endif
> +
>   # ?
>   LDFLAGS_vmlinux += -p
>   # Report unresolved symbol references

I've not been bitten by this one. Is there anyone else who can comment
on KBUILD_CFLAGS vs KBUILD_CPPFLAGS?

I had assumed that arch/arm/boot/compressed/Makefile would have
inherited the flags from the arch/arm/Makefile. Someone else would
be better qualified to comment on this.

> Index: linux-linaro-tracking/arch/arm/boot/compressed/head.S
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/boot/compressed/head.S
> +++ linux-linaro-tracking/arch/arm/boot/compressed/head.S
> @@ -141,6 +141,12 @@ start:
>   #endif
>           mov    r7, r1            @ save architecture ID
>           mov    r8, r2            @ save atags pointer
> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
> +                /*
> +                 * Switch to bigendian mode
> +                 */
> +                setend  be
> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>
>   #ifndef __ARM_ARCH_2__
>           /*
> Index: linux-linaro-tracking/arch/arm/include/asm/setup.h
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/include/asm/setup.h
> +++ linux-linaro-tracking/arch/arm/include/asm/setup.h
> @@ -53,4 +53,8 @@ extern int arm_add_memory(phys_addr_t st
>   extern void early_print(const char *str, ...);
>   extern void dump_machine_table(void);
>
> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
> +void byteswap_tags(struct tag *t);
> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
> +
>   #endif

You could have done

#ifdef CONFIG_LITTLE_ENDIAN_LOADER
void byteswap_tags(struct tag *t);
#else
static inline void byteswap_tags(struct tag *t) { }
#endif /* CONFIG_LITTLE_ENDIAN_LOADER */


> Index: linux-linaro-tracking/arch/arm/kernel/Makefile
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/kernel/Makefile
> +++ linux-linaro-tracking/arch/arm/kernel/Makefile
> @@ -22,6 +22,7 @@ obj-y        := elf.o entry-armv.o entry-commo
>   obj-$(CONFIG_ATAGS)        += atags_parse.o
>   obj-$(CONFIG_ATAGS_PROC)    += atags_proc.o
>   obj-$(CONFIG_DEPRECATED_PARAM_STRUCT) += atags_compat.o
> +obj-$(CONFIG_LITTLE_ENDIAN_LOADER) += atags_byteswap.o
>
>   obj-$(CONFIG_OC_ETM)        += etm.o
>   obj-$(CONFIG_CPU_IDLE)        += cpuidle.o
> Index: linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c
> ===================================================================
> --- /dev/null
> +++ linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c
> @@ -0,0 +1,119 @@
> +#include<linux/slab.h>
> +#include<linux/proc_fs.h>
> +#include<linux/swab.h>
> +#include<asm/setup.h>
> +#include<asm/types.h>
> +#include<asm/page.h>
> +
> +#define byteswap32(x) (x) = __swab32((x))
> +#define byteswap16(x) (x) = __swab16((x))
> +
> +static void __init byteswap_tag_core(struct tag *tag)
> +{
> +  byteswap32(tag->u.core.flags);
> +  byteswap32(tag->u.core.pagesize);
> +  byteswap32(tag->u.core.rootdev);
> +}
> +
> +static void __init byteswap_tag_mem32(struct tag *tag)
> +{
> +  byteswap32(tag->u.mem.size);
> +  byteswap32(tag->u.mem.start);
> +}
> +
> +static void __init byteswap_tag_videotext(struct tag *tag)
> +{
> +  byteswap16(tag->u.videotext.video_page);
> +  byteswap16(tag->u.videotext.video_ega_bx);
> +  byteswap16(tag->u.videotext.video_points);
> +}
> +
> +static void __init byteswap_tag_ramdisk(struct tag *tag)
> +{
> +  byteswap32(tag->u.ramdisk.flags);
> +  byteswap32(tag->u.ramdisk.size);
> +  byteswap32(tag->u.ramdisk.start);
> +}
> +
> +static void __init byteswap_tag_initrd(struct tag *tag)
> +{
> +  byteswap32(tag->u.initrd.start);
> +  byteswap32(tag->u.initrd.size);
> +}
> +
> +static void __init byteswap_tag_serialnr(struct tag *tag)
> +{
> +  byteswap32(tag->u.serialnr.low);
> +  byteswap32(tag->u.serialnr.high);
> +}
> +
> +static void __init byteswap_tag_revision(struct tag *tag)
> +{
> +    byteswap32(tag->u.revision.rev);
> +}
> +
> +static void __init byteswap_tag_videolfb(struct tag *tag)
> +{
> +    byteswap16(tag->u.videolfb.lfb_width);
> +    byteswap16(tag->u.videolfb.lfb_height);
> +    byteswap16(tag->u.videolfb.lfb_depth);
> +    byteswap16(tag->u.videolfb.lfb_linelength);
> +    byteswap32(tag->u.videolfb.lfb_base);
> +    byteswap32(tag->u.videolfb.lfb_size);
> +}
> +
> +static void __init byteswap_tag_acorn(struct tag *tag)
> +{
> +    byteswap32(tag->u.acorn.memc_control_reg);
> +    byteswap32(tag->u.acorn.vram_pages);
> +}
> +
> +static void __init byteswap_tag_memclk(struct tag *tag)
> +{
> +    byteswap32(tag->u.memclk.fmemclk);
> +}
> +
> +void __init byteswap_tags(struct tag *t)
> +{
> +  for (; t->hdr.size; t = tag_next(t)) {
> +    byteswap32(t->hdr.size);
> +    byteswap32(t->hdr.tag);
> +    switch(t->hdr.tag) {
> +    case ATAG_CORE:
> +      byteswap_tag_core(t);
> +      break;
> +    case ATAG_MEM:
> +      byteswap_tag_mem32(t);
> +      break;
> +    case ATAG_VIDEOTEXT:
> +      byteswap_tag_videotext(t);
> +      break;
> +    case ATAG_RAMDISK:
> +      byteswap_tag_ramdisk(t);
> +      break;
> +    case ATAG_INITRD2:
> +      byteswap_tag_initrd(t);
> +      break;
> +    case ATAG_SERIAL:
> +      byteswap_tag_serialnr(t);
> +      break;
> +    case ATAG_REVISION:
> +      byteswap_tag_revision(t);
> +      break;
> +    case ATAG_VIDEOLFB:
> +      byteswap_tag_videolfb(t);
> +      break;
> +    case ATAG_CMDLINE:
> +      break;
> +    case ATAG_ACORN:
> +      byteswap_tag_acorn(t);
> +      break;
> +    case ATAG_MEMCLK:
> +      byteswap_tag_memclk(t);
> +      break;
> +    default:
> +      /* Need to complain? */
> +      break;
> +    }
> +  }
> +}

I think swapping on read is a better idea, as it leaves
these in their original format.

PS, looks like you either have a formatting issue with
your email client or your code. Did this pass checkpatch?

> Index: linux-linaro-tracking/arch/arm/kernel/head-common.S
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/kernel/head-common.S
> +++ linux-linaro-tracking/arch/arm/kernel/head-common.S
> @@ -48,6 +48,9 @@ __vet_atags:
>       bne    1f
>
>       ldr    r5, [r2, #0]
> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
> +        rev     r5, r5
> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>   #ifdef CONFIG_OF_FLATTREE
>       ldr    r6, =OF_DT_MAGIC        @ is it a DTB?
>       cmp    r5, r6
> @@ -57,6 +60,9 @@ __vet_atags:
>       cmpne    r5, #ATAG_CORE_SIZE_EMPTY
>       bne    1f
>       ldr    r5, [r2, #4]
> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
> +        rev     r5, r5
> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>       ldr    r6, =ATAG_CORE
>       cmp    r5, r6
>       bne    1f

bit-untidy. also, does OF_DT_MAGIC get changed by the
endian-ness, or is that somewhere else in the head code?

> Index: linux-linaro-tracking/arch/arm/kernel/module.c
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/kernel/module.c
> +++ linux-linaro-tracking/arch/arm/kernel/module.c
> @@ -45,6 +45,65 @@ void *module_alloc(unsigned long size)
>   }
>   #endif
>
> +/*
> + * For relocations in exectuable sections if we configured with
> + * CONFIG_CPU_ENDIAN_BE8 we assume that code is in little endian byteswap
> + * form already and we need to byteswap value when we read and when we
> + * write. If relocaton R_ARM_ABS32 type we assume that it is relocation for
> + * data piece and we do not perform byteswaps when such relocation applied.
> + * This herustic based approach may break in future.
> + *
> + * It seems that more correct way to handle be8 in kernel module is
> + * to build module without --be8 option, but perform instruction byteswap
> + * when module sections are loaded, after all relocation are applied.
> + * In order to do such byteswap we need to read all mappings symbols ($a,
> + * $d, $t), sort them, and apply byteswaps for $a and $t entries.
> + */
> +static inline u32 read_section_u32(Elf32_Shdr *dstsec, unsigned long loc)
> +{
> +       u32 retval = *(u32 *)loc;
> +
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        if (dstsec->sh_flags&  SHF_EXECINSTR) {
> +                retval = __swab32(retval);
> +        }
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +        return retval;
> +}
> +
> +static inline void write_section_u32(Elf32_Shdr *dstsec, unsigned
> long loc, u32 value)
> +{
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        if (dstsec->sh_flags&  SHF_EXECINSTR) {
> +                value = __swab32(value);
> +        }
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +        *(u32 *) loc = value;
> +}
> +
> +#ifdef CONFIG_THUMB2_KERNEL
> +static inline u16 read_section_u16(Elf32_Shdr *dstsec, unsigned long loc)
> +{
> +        u16 retval = *(u16 *) loc;
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        if (dstsec->sh_flags&  SHF_EXECINSTR) {
> +                retval = __swab16(retval);
> +        }
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +        return retval;
> +}
> +
> +static inline void write_section_u16(Elf32_Shdr *dstsec, unsigned
> long loc, u16 value)
> +{
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        if (dstsec->sh_flags&  SHF_EXECINSTR) {
> +                value = __swab16(value);
> +        }
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +        *(u16 *) loc = value;
> +}
> +#endif /* CONFIG_THUMB2_KERNEL */
> +
>   int
>   apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
>              unsigned int relindex, struct module *module)
> @@ -60,6 +119,7 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>           Elf32_Sym *sym;
>           const char *symname;
>           s32 offset;
> +                u32 loc_u32;
>   #ifdef CONFIG_THUMB2_KERNEL
>           u32 upper, lower, sign, j1, j2;
>   #endif
> @@ -95,7 +155,8 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>           case R_ARM_PC24:
>           case R_ARM_CALL:
>           case R_ARM_JUMP24:
> -            offset = (*(u32 *)loc&  0x00ffffff)<<  2;
> +                        loc_u32 = read_section_u32(dstsec, loc);
> +            offset = (loc_u32&  0x00ffffff)<<  2;
>               if (offset&  0x02000000)
>                   offset -= 0x04000000;
>
> @@ -112,27 +173,33 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>
>               offset>>= 2;
>
> -            *(u32 *)loc&= 0xff000000;
> -            *(u32 *)loc |= offset&  0x00ffffff;
> +            loc_u32&= 0xff000000;
> +            loc_u32 |= offset&  0x00ffffff;
> +                        write_section_u32(dstsec, loc, loc_u32);
>               break;
>
> -           case R_ARM_V4BX:
> +                case R_ARM_V4BX:
> +                        loc_u32 = read_section_u32(dstsec, loc);
>                  /* Preserve Rm and the condition code. Alter
>               * other bits to re-code instruction as
>               * MOV PC,Rm.
>               */
> -               *(u32 *)loc&= 0xf000000f;
> -               *(u32 *)loc |= 0x01a0f000;
> +               loc_u32&= 0xf000000f;
> +               loc_u32 |= 0x01a0f000;
> +                       write_section_u32(dstsec, loc, loc_u32);
>                  break;
>
>           case R_ARM_PREL31:
> -            offset = *(u32 *)loc + sym->st_value - loc;
> -            *(u32 *)loc = offset&  0x7fffffff;
> +                        loc_u32 = read_section_u32(dstsec, loc);
> +            offset = loc_u32 + sym->st_value - loc;
> +            loc_u32 = offset&  0x7fffffff;
> +                        write_section_u32(dstsec, loc, loc_u32);
>               break;
>
>           case R_ARM_MOVW_ABS_NC:
>           case R_ARM_MOVT_ABS:
> -            offset = *(u32 *)loc;
> +                        loc_u32 = read_section_u32(dstsec, loc);
> +            offset = loc_u32;
>               offset = ((offset&  0xf0000)>>  4) | (offset&  0xfff);
>               offset = (offset ^ 0x8000) - 0x8000;
>
> @@ -140,16 +207,17 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>               if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS)
>                   offset>>= 16;
>
> -            *(u32 *)loc&= 0xfff0f000;
> -            *(u32 *)loc |= ((offset&  0xf000)<<  4) |
> -                    (offset&  0x0fff);
> +            loc_u32&= 0xfff0f000;
> +            loc_u32 |= ((offset&  0xf000)<<  4) |
> +                                   (offset&  0x0fff);
> +                        write_section_u32(dstsec, loc, loc_u32);
>               break;
>
>   #ifdef CONFIG_THUMB2_KERNEL
>           case R_ARM_THM_CALL:
>           case R_ARM_THM_JUMP24:
> -            upper = *(u16 *)loc;
> -            lower = *(u16 *)(loc + 2);
> +                        upper = read_section_u16(dstsec, loc);
> +            lower = read_section_u16(dstsec, loc + 2);
>
>               /*
>                * 25 bit signed address range (Thumb-2 BL and B.W
> @@ -198,17 +266,19 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>               sign = (offset>>  24)&  1;
>               j1 = sign ^ (~(offset>>  23)&  1);
>               j2 = sign ^ (~(offset>>  22)&  1);
> -            *(u16 *)loc = (u16)((upper&  0xf800) | (sign<<  10) |
> -                        ((offset>>  12)&  0x03ff));
> -            *(u16 *)(loc + 2) = (u16)((lower&  0xd000) |
> -                          (j1<<  13) | (j2<<  11) |
> -                          ((offset>>  1)&  0x07ff));
> +                        write_section_u16(dstsec, loc,
> +                                          (u16)((upper&  0xf800) |
> (sign<<  10) |
> +                                                ((offset>>  12)&  0x03ff)));
> +                        write_section_u16(dstsec, loc + 2,
> +                                          (u16)((lower&  0xd000) |
> +                                                (j1<<  13) | (j2<<  11) |
> +                                                ((offset>>  1)&  0x07ff)));
>               break;
>
>           case R_ARM_THM_MOVW_ABS_NC:
>           case R_ARM_THM_MOVT_ABS:
> -            upper = *(u16 *)loc;
> -            lower = *(u16 *)(loc + 2);
> +                        upper = read_section_u16(dstsec, loc);
> +            lower = read_section_u16(dstsec, loc + 2);
>
>               /*
>                * MOVT/MOVW instructions encoding in Thumb-2:
> @@ -229,12 +299,14 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>               if (ELF32_R_TYPE(rel->r_info) == R_ARM_THM_MOVT_ABS)
>                   offset>>= 16;
>
> -            *(u16 *)loc = (u16)((upper&  0xfbf0) |
> -                        ((offset&  0xf000)>>  12) |
> -                        ((offset&  0x0800)>>  1));
> -            *(u16 *)(loc + 2) = (u16)((lower&  0x8f00) |
> -                          ((offset&  0x0700)<<  4) |
> -                          (offset&  0x00ff));
> +                        write_section_u16(dstsec, loc,
> +                                          (u16)((upper&  0xfbf0) |
> +                                                ((offset&  0xf000)>>  12) |
> +                                                ((offset&  0x0800)>>  1)));
> +                        write_section_u16(dstsec, loc + 2,
> +                                          (u16)((lower&  0x8f00) |
> +                                                ((offset&  0x0700)<<  4) |
> +                                                (offset&  0x00ff)));
>               break;
>   #endif
>

I'm still not sure about this, I think the use of <asm/opcodes.h>
is better here. Not sure if there's much point in checking the flags
on the sections as I think the relocations pretty much define which
endian they will be in.

> Index: linux-linaro-tracking/arch/arm/mm/Kconfig
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/mm/Kconfig
> +++ linux-linaro-tracking/arch/arm/mm/Kconfig
> @@ -695,6 +695,15 @@ config CPU_ENDIAN_BE32
>       help
>         Support for the BE-32 (big-endian) mode on pre-ARMv6 processors.
>
> +config LITTLE_ENDIAN_LOADER
> +        bool
> +        depends on CPU_BIG_ENDIAN
> +        default y
> +        help
> +          If Linux should operate in big-endian mode, but bootloader
> (i.e u-boot)
> +          is still in little endian mode and passes boot information in little
> +          endian form set this flag
> +
>   config CPU_HIGH_VECTOR
>       depends on !MMU&&  CPU_CP15&&  !CPU_ARM740T
>       bool "Select the High exception vector"


> Index: linux-linaro-tracking/arch/arm/mm/alignment.c
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/mm/alignment.c
> +++ linux-linaro-tracking/arch/arm/mm/alignment.c
> @@ -792,6 +792,10 @@ do_alignment(unsigned long addr, unsigne
>
>       regs->ARM_pc += isize;
>
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        instr = __swab32(instr); /* little endian instruction */
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +
>       switch (CODING_BITS(instr)) {
>       case 0x00000000:    /* 3.13.4 load/store instruction extensions */
>           if (LDSTHD_I_BIT(instr))

See comments on <asm/opcodes.h>

> Index: linux-linaro-tracking/arch/arm/kernel/head.S
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/kernel/head.S
> +++ linux-linaro-tracking/arch/arm/kernel/head.S
> @@ -89,6 +89,12 @@ ENTRY(stext)
>       @ ensure svc mode and all interrupts masked
>       safe_svcmode_maskall r9
>
> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
> +    /*
> +    * Switch to bigendian mode
> +    */
> +    setend    be
> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>       mrc    p15, 0, r9, c0, c0        @ get processor id
>       bl    __lookup_processor_type        @ r5=procinfo r9=cpuid
>       movs    r10, r5                @ invalid processor (r5=0)?
> @@ -356,6 +362,12 @@ ENTRY(secondary_startup)
>   #endif
>       safe_svcmode_maskall r9
>
> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
> +    /*
> +    * Switch to bigendian mode
> +    */
> +    setend    be
> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>       mrc    p15, 0, r9, c0, c0        @ get processor id
>       bl    __lookup_processor_type
>       movs    r10, r5                @ invalid processor?
> @@ -427,6 +439,9 @@ __enable_mmu:
>   #ifdef CONFIG_CPU_ICACHE_DISABLE
>       bic    r0, r0, #CR_I
>   #endif
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +    orr    r0, r0, #CR_EE    @ big-endian page tables
> +#endif
>   #ifdef CONFIG_ARM_LPAE
>       mov    r5, #0
>       mcrr    p15, 0, r4, r5, c2        @ load TTBR0
> @@ -592,8 +607,14 @@ __fixup_a_pv_table:
>       b    2f
>   1:    add     r7, r3
>       ldrh    ip, [r7, #2]
> +#ifdef __ARMEB__
> +        rev     ip, ip   @ byteswap instruction
> +#endif /* __ARMEB__ */
>       and    ip, 0x8f00
>       orr    ip, r6    @ mask in offset bits 31-24
> +#ifdef __ARMEB__
> +        rev     ip, ip   @ byteswap instruction
> +#endif /* __ARMEB__ */
>       strh    ip, [r7, #2]
>   2:    cmp    r4, r5
>       ldrcc    r7, [r4], #4    @ use branch for delay slot
> @@ -602,8 +623,14 @@ __fixup_a_pv_table:
>   #else
>       b    2f
>   1:    ldr    ip, [r7, r3]
> +#ifdef __ARMEB__
> +        rev     ip, ip   @ byteswap instruction
> +#endif /* __ARMEB__ */
>       bic    ip, ip, #0x000000ff
>       orr    ip, ip, r6    @ mask in offset bits 31-24
> +#ifdef __ARMEB__
> +        rev     ip, ip   @ byteswap instruction
> +#endif /* __ARMEB__ */
>       str    ip, [r7, r3]
>   2:    cmp    r4, r5
>       ldrcc    r7, [r4], #4    @ use branch for delay slot

Yeah, prefer my macros for doing this.


> Index: linux-linaro-tracking/arch/arm/kernel/kprobes.c
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/kernel/kprobes.c
> +++ linux-linaro-tracking/arch/arm/kernel/kprobes.c
> @@ -66,14 +66,24 @@ int __kprobes arch_prepare_kprobe(struct
>       if (is_wide_instruction(insn)) {
>           insn<<= 16;
>           insn |= ((u16 *)addr)[1];
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +                insn = __swab32(insn); /* little endian instruction */
> +#endif
>           decode_insn = thumb32_kprobe_decode_insn;
> -    } else
> -        decode_insn = thumb16_kprobe_decode_insn;
> +    } else {
> +                decode_insn = thumb16_kprobe_decode_insn;
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +                insn = __swab16(insn); /* little endian instruction */
> +#endif
> +        }
>   #else /* !CONFIG_THUMB2_KERNEL */
>       thumb = false;
>       if (addr&  0x3)
>           return -EINVAL;
>       insn = *p->addr;
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        insn = __swab32(insn); /* little endian instruction */
> +#endif
>       decode_insn = arm_kprobe_decode_insn;
>   #endif
>
> @@ -89,7 +99,11 @@ int __kprobes arch_prepare_kprobe(struct
>           if (!p->ainsn.insn)
>               return -ENOMEM;
>           for (is = 0; is<  MAX_INSN_SIZE; ++is)
> +#ifndef CONFIG_CPU_ENDIAN_BE8
>               p->ainsn.insn[is] = tmp_insn[is];
> +#else
> +                        p->ainsn.insn[is] = __swab32(tmp_insn[is]);
> /* little endian instruction */
> +#endif
>           flush_insns(p->ainsn.insn,
>                   sizeof(p->ainsn.insn[0]) * MAX_INSN_SIZE);
>           p->ainsn.insn_fn = (kprobe_insn_fn_t *)
> @@ -113,10 +127,17 @@ void __kprobes arch_arm_kprobe(struct kp
>           /* Remove any Thumb flag */
>           addr = (void *)((uintptr_t)p->addr&  ~1);
>
> -        if (is_wide_instruction(p->opcode))
> +        if (is_wide_instruction(p->opcode)) {
>               brkp = KPROBE_THUMB32_BREAKPOINT_INSTRUCTION;
> -        else
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +                brkp = __swab32(brkp);
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +        } else {
>               brkp = KPROBE_THUMB16_BREAKPOINT_INSTRUCTION;
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +                brkp = __swab16(brkp);
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +            }
>       } else {
>           kprobe_opcode_t insn = p->opcode;
>
> @@ -127,6 +148,10 @@ void __kprobes arch_arm_kprobe(struct kp
>               brkp |= 0xe0000000;  /* Unconditional instruction */
>           else
>               brkp |= insn&  0xf0000000;  /* Copy condition from insn */
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        brkp = __swab32(brkp);
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +
>       }
>
>       patch_text(addr, brkp);
> @@ -144,8 +169,12 @@ int __kprobes __arch_disarm_kprobe(void
>   {
>       struct kprobe *kp = p;
>       void *addr = (void *)((uintptr_t)kp->addr&  ~1);
> +    kprobe_opcode_t insn = kp->opcode;
>
> -    __patch_text(addr, kp->opcode);
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        insn = __swab32(insn); /* little endian instruction */
> +#endif
> +    __patch_text(addr, insn);
>
>       return 0;
>

__patch_text already does swapping, so a little concerned that it
is being done twice in some places.


> Index: linux-linaro-tracking/arch/arm/kernel/traps.c
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/kernel/traps.c
> +++ linux-linaro-tracking/arch/arm/kernel/traps.c
> @@ -426,6 +426,10 @@ asmlinkage void __exception do_undefinst
>           goto die_sig;
>       }
>
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        instr = __swab32(instr); /* little endian instruction */
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +
>       if (call_undef_hook(regs, instr) == 0)
>           return;

already fixed.

> Index: linux-linaro-tracking/arch/arm/kernel/ftrace.c
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/kernel/ftrace.c
> +++ linux-linaro-tracking/arch/arm/kernel/ftrace.c
> @@ -100,10 +100,18 @@ static int ftrace_modify_code(unsigned l
>           if (probe_kernel_read(&replaced, (void *)pc, MCOUNT_INSN_SIZE))
>               return -EFAULT;
>
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        replaced = __swab32(replaced); /* little endian instruction */
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +
>           if (replaced != old)
>               return -EINVAL;
>       }
>
> +#ifdef CONFIG_CPU_ENDIAN_BE8
> +        new = __swab32(new); /* little endian instruction */
> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
> +
>       if (probe_kernel_write((void *)pc,&new, MCOUNT_INSN_SIZE))
>           return -EPERM;

Please see <asm/opcodes.h> for dealing with the op-codes like this.

> Index: linux-linaro-tracking/arch/arm/kernel/atags_parse.c
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/kernel/atags_parse.c
> +++ linux-linaro-tracking/arch/arm/kernel/atags_parse.c
> @@ -216,6 +216,10 @@ struct machine_desc * __init setup_machi
>       if (tags->hdr.tag != ATAG_CORE)
>           convert_to_tag_list(tags);
>   #endif
> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
> +    byteswap_tags(tags);
> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
> +
>       if (tags->hdr.tag != ATAG_CORE) {
>           early_print("Warning: Neither atags nor dtb found\n");
>           tags = (struct tag *)&default_tags;
> Index: linux-linaro-tracking/arch/arm/kernel/sleep.S

I really don't like this way, it makes it difficult to deal with
if you export this data to user-space to then do things like md5sum
it. Also, if you're going to kexec() a new kernel, would you have
to re-swap it back before this?

> Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h
> ===================================================================
> --- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h
> +++ linux-linaro-tracking/arch/arm/include/asm/atomic.h
> @@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a
>
>       __asm__ __volatile__("@ atomic64_add\n"
>   "1:    ldrexd    %0, %H0, [%3]\n"
> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>   "    adds    %0, %0, %4\n"
>   "    adc    %H0, %H0, %H4\n"
> +#else
> +"    adds    %H0, %H0, %H4\n"
> +"    adc    %0, %0, %4\n"
> +#endif
>   "    strexd    %1, %0, %H0, [%3]\n"
>   "    teq    %1, #0\n"
>   "    bne    1b"
> @@ -320,8 +325,13 @@ static inline u64 atomic64_add_return(u6
>
>       __asm__ __volatile__("@ atomic64_add_return\n"
>   "1:    ldrexd    %0, %H0, [%3]\n"
> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>   "    adds    %0, %0, %4\n"
>   "    adc    %H0, %H0, %H4\n"
> +#else
> +"    adds    %H0, %H0, %H4\n"
> +"    adc    %0, %0, %4\n"
> +#endif
>   "    strexd    %1, %0, %H0, [%3]\n"
>   "    teq    %1, #0\n"
>   "    bne    1b"
> @@ -341,8 +351,13 @@ static inline void atomic64_sub(u64 i, a
>
>       __asm__ __volatile__("@ atomic64_sub\n"
>   "1:    ldrexd    %0, %H0, [%3]\n"
> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>   "    subs    %0, %0, %4\n"
>   "    sbc    %H0, %H0, %H4\n"
> +#else
> +"    subs    %H0, %H0, %H4\n"
> +"    sbc    %0, %0, %4\n"
> +#endif
>   "    strexd    %1, %0, %H0, [%3]\n"
>   "    teq    %1, #0\n"
>   "    bne    1b"
> @@ -360,8 +375,13 @@ static inline u64 atomic64_sub_return(u6
>
>       __asm__ __volatile__("@ atomic64_sub_return\n"
>   "1:    ldrexd    %0, %H0, [%3]\n"
> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>   "    subs    %0, %0, %4\n"
>   "    sbc    %H0, %H0, %H4\n"
> +#else
> +"    subs    %H0, %H0, %H4\n"
> +"    sbc    %0, %0, %4\n"
> +#endif
>   "    strexd    %1, %0, %H0, [%3]\n"
>   "    teq    %1, #0\n"
>   "    bne    1b"
> @@ -428,9 +448,15 @@ static inline u64 atomic64_dec_if_positi
>
>       __asm__ __volatile__("@ atomic64_dec_if_positive\n"
>   "1:    ldrexd    %0, %H0, [%3]\n"
> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>   "    subs    %0, %0, #1\n"
>   "    sbc    %H0, %H0, #0\n"
>   "    teq    %H0, #0\n"
> +#else
> +"    subs    %H0, %H0, #1\n"
> +"    sbc    %0, %0, #0\n"
> +"    teq    %0, #0\n"
> +#endif
>   "    bmi    2f\n"
>   "    strexd    %1, %0, %H0, [%3]\n"
>   "    teq    %1, #0\n"
> @@ -459,8 +485,13 @@ static inline int atomic64_add_unless(at
>   "    teqeq    %H0, %H5\n"
>   "    moveq    %1, #0\n"
>   "    beq    2f\n"
> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>   "    adds    %0, %0, %6\n"
>   "    adc    %H0, %H0, %H6\n"
> +#else
> +"    adds    %H0, %H0, %H6\n"
> +"    adc    %0, %0, %6\n"
> +#endif
>   "    strexd    %2, %0, %H0, [%4]\n"
>   "    teq    %2, #0\n"
>   "    bne    1b\n"

I still believe that you're doing the wrong thing here
and that these can be left well enough alone. Please give
a concrete piece of code that can show they're actually
doing the wrong thing.



-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 13+ messages in thread

* updates for be8 patch series
  2013-07-23 18:30       ` Ben Dooks
@ 2013-07-24  1:06         ` Victor Kamensky
  2013-07-24  9:36           ` Will Deacon
  2013-07-24 10:46           ` Ben Dooks
  2013-07-24  7:31         ` Victor Kamensky
  1 sibling, 2 replies; 13+ messages in thread
From: Victor Kamensky @ 2013-07-24  1:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ben,

Let me split off atomic64 from other issues, so we have focused topic here.

Please see atomic64 issue test case below. Please try it in your setup. I've
run on Pandaboard ES with my set of patches, but I believe it should be the
same in your case.

<snip>

>> Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h
>> +++ linux-linaro-tracking/arch/arm/include/asm/atomic.h
>> @@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a
>>
>>       __asm__ __volatile__("@ atomic64_add\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    adds    %0, %0, %4\n"
>>   "    adc    %H0, %H0, %H4\n"
>> +#else
>> +"    adds    %H0, %H0, %H4\n"
>> +"    adc    %0, %0, %4\n"
>> +#endif
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>>   "    bne    1b"
>> @@ -320,8 +325,13 @@ static inline u64 atomic64_add_return(u6
>>
>>       __asm__ __volatile__("@ atomic64_add_return\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    adds    %0, %0, %4\n"
>>   "    adc    %H0, %H0, %H4\n"
>> +#else
>> +"    adds    %H0, %H0, %H4\n"
>> +"    adc    %0, %0, %4\n"
>> +#endif
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>>   "    bne    1b"
>> @@ -341,8 +351,13 @@ static inline void atomic64_sub(u64 i, a
>>
>>       __asm__ __volatile__("@ atomic64_sub\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    subs    %0, %0, %4\n"
>>   "    sbc    %H0, %H0, %H4\n"
>> +#else
>> +"    subs    %H0, %H0, %H4\n"
>> +"    sbc    %0, %0, %4\n"
>> +#endif
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>>   "    bne    1b"
>> @@ -360,8 +375,13 @@ static inline u64 atomic64_sub_return(u6
>>
>>       __asm__ __volatile__("@ atomic64_sub_return\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    subs    %0, %0, %4\n"
>>   "    sbc    %H0, %H0, %H4\n"
>> +#else
>> +"    subs    %H0, %H0, %H4\n"
>> +"    sbc    %0, %0, %4\n"
>> +#endif
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>>   "    bne    1b"
>> @@ -428,9 +448,15 @@ static inline u64 atomic64_dec_if_positi
>>
>>       __asm__ __volatile__("@ atomic64_dec_if_positive\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    subs    %0, %0, #1\n"
>>   "    sbc    %H0, %H0, #0\n"
>>   "    teq    %H0, #0\n"
>> +#else
>> +"    subs    %H0, %H0, #1\n"
>> +"    sbc    %0, %0, #0\n"
>> +"    teq    %0, #0\n"
>> +#endif
>>   "    bmi    2f\n"
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>> @@ -459,8 +485,13 @@ static inline int atomic64_add_unless(at
>>   "    teqeq    %H0, %H5\n"
>>   "    moveq    %1, #0\n"
>>   "    beq    2f\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    adds    %0, %0, %6\n"
>>   "    adc    %H0, %H0, %H6\n"
>> +#else
>> +"    adds    %H0, %H0, %H6\n"
>> +"    adc    %0, %0, %6\n"
>> +#endif
>>   "    strexd    %2, %0, %H0, [%4]\n"
>>   "    teq    %2, #0\n"
>>   "    bne    1b\n"
>
>
> I still believe that you're doing the wrong thing here
> and that these can be left well enough alone. Please give
> a concrete piece of code that can show they're actually
> doing the wrong thing.

Disable CONFIG_GENERIC_ATOMIC64
-------------------------------

Make sure that CONFIG_GENERIC_ATOMIC64 is not set. Otherwise atomic64_add from
lib/atomic64.c will be used and that one works correctly but this case does
not use atomic64_XXX inline functions from arch/arm/include/asm/atomic.h

Personally with my current kernel close to 3.10 I failed to do that and manage
to get live kernel, so to illustrate the issue I will use my own copy of
inline atomic64_add function. But effectively it will show the same problem
that I tried to report.

Compiler used
-------------

gcc-linaro-arm-linux-gnueabihf-4.7-2013.01-20130125_linux

Test module source
------------------

#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/atomic.h>

static inline void my_atomic64_add_original(u64 i, atomic64_t *v)
{
    u64 result;
    unsigned long tmp;

    __asm__ __volatile__("@ atomic64_add\n"
"1:    ldrexd    %0, %H0, [%3]\n"
"    adds    %0, %0, %4\n"
"    adc    %H0, %H0, %H4\n"
"    strexd    %1, %0, %H0, [%3]\n"
"    teq    %1, #0\n"
"    bne    1b"
    : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
    : "r" (&v->counter), "r" (i)
    : "cc");
}

static inline void my_atomic64_add_fixed(u64 i, atomic64_t *v)
{
    u64 result;
    unsigned long tmp;

    __asm__ __volatile__("@ atomic64_add\n"
"1:    ldrexd    %0, %H0, [%3]\n"
#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
"    adds    %0, %0, %4\n"
"    adc    %H0, %H0, %H4\n"
#else
"    adds    %H0, %H0, %H4\n"
"    adc    %0, %0, %4\n"
#endif
"    strexd    %1, %0, %H0, [%3]\n"
"    teq    %1, #0\n"
"    bne    1b"
    : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
    : "r" (&v->counter), "r" (i)
    : "cc");
}


atomic64_t a = {0};

#define INCREMENT 0x40000000

void atomic_update_original (void)
{
    my_atomic64_add_original(INCREMENT, &a);
}

void atomic_update_fixed (void)
{
    my_atomic64_add_fixed(INCREMENT, &a);
}

void test_atomic64_original(void)
{
    int i;

    printk("original atomic64_add\n");

    atomic64_set(&a, 0);

    printk("a = %llx\n", a.counter);

    for (i = 0; i < 10; i++) {
        atomic_update_original();
        printk("i = %2d: a =  %llx\n", i, a.counter);
    }
}

void test_atomic64_fixed(void)
{
    int i;

    printk("fixed atomic64_add\n");

    atomic64_set(&a, 0);

    printk("a = %llx\n", a.counter);

    for (i = 0; i < 10; i++) {
        atomic_update_fixed();
        printk("i = %2d: a =  %llx\n", i, a.counter);
    }
}

int
init_module (void)
{
    test_atomic64_original();
    test_atomic64_fixed();
    return 0;
}

void
cleanup_module(void)
{
}

MODULE_LICENSE("GPL");


Test run
--------

Compile and run module, observe in original code counter is not incremented
correctly.

root@genericarmv7ab:~# insmod atomic64.ko
[   73.041107] original atomic64_add
[   73.044647] a = 0
[   73.046691] i =  0: a =  40000000
[   73.050659] i =  1: a =  80000000
[   73.054199] i =  2: a =  c0000000
[   73.057983] i =  3: a =  0
[   73.060852] i =  4: a =  40000000
[   73.064361] i =  5: a =  80000000
[   73.067932] i =  6: a =  c0000000
[   73.071441] i =  7: a =  0
[   73.074310] i =  8: a =  40000000
[   73.077850] i =  9: a =  80000000
[   73.081390] fixed atomic64_add
[   73.084625] a = 0
[   73.086669] i =  0: a =  40000000
[   73.090209] i =  1: a =  80000000
[   73.093749] i =  2: a =  c0000000
[   73.097290] i =  3: a =  100000000
[   73.100891] i =  4: a =  140000000
[   73.104522] i =  5: a =  180000000
[   73.108154] i =  6: a =  1c0000000
[   73.111755] i =  7: a =  200000000
[   73.115386] i =  8: a =  240000000
[   73.119018] i =  9: a =  280000000


Disassemble of original code
----------------------------

(gdb) disassemble atomic_update_original
Dump of assembler code for function atomic_update_original:
   0x00000000 <+0>:    push    {r4}        ; (str r4, [sp, #-4]!)
   0x00000004 <+4>:    mov    r3, #1073741824    ; 0x40000000
   0x00000008 <+8>:    ldr    r12, [pc, #32]    ; 0x30
<atomic_update_original+48>
   0x0000000c <+12>:    mov    r2, #0
   0x00000010 <+16>:    ldrexd    r0, [r12]
   0x00000014 <+20>:    adds    r0, r0, r2
   0x00000018 <+24>:    adc    r1, r1, r3
   0x0000001c <+28>:    strexd    r4, r0, [r12]
   0x00000020 <+32>:    teq    r4, #0
   0x00000024 <+36>:    bne    0x10 <atomic_update_original+16>
   0x00000028 <+40>:    ldmfd    sp!, {r4}
   0x0000002c <+44>:    bx    lr
   0x00000030 <+48>:    andeq    r0, r0, r0
End of assembler dump.

Disassemble of fixed code
-------------------------

(gdb) disassemble atomic_update_fixed
Dump of assembler code for function atomic_update_fixed:
   0x00000034 <+0>:    push    {r4}        ; (str r4, [sp, #-4]!)
   0x00000038 <+4>:    mov    r3, #1073741824    ; 0x40000000
   0x0000003c <+8>:    ldr    r12, [pc, #32]    ; 0x64 <atomic_update_fixed+48>
   0x00000040 <+12>:    mov    r2, #0
   0x00000044 <+16>:    ldrexd    r0, [r12]
   0x00000048 <+20>:    adds    r1, r1, r3
   0x0000004c <+24>:    adc    r0, r0, r2
   0x00000050 <+28>:    strexd    r4, r0, [r12]
   0x00000054 <+32>:    teq    r4, #0
   0x00000058 <+36>:    bne    0x44 <atomic_update_fixed+16>
   0x0000005c <+40>:    ldmfd    sp!, {r4}
   0x00000060 <+44>:    bx    lr
   0x00000064 <+48>:    andeq    r0, r0, r0
End of assembler dump.

Fixed code explanation
----------------------

- $r2 has 4 most significant bytes of increement (0)
- $r3 has 4 least significant bytes of increement (0x40000000)

- Load from address at $r12 'long long' into r0 and r1
  $r0 will get 4 most significant bytes (i.e 4 bytes at $r12)
  $r1 will get 4 least significan bytes (i.e 4 bytes at $r12+4)
  loads are big endian, so $r0 and $r0 will be read with proper be swap

   0x00000044 <+16>:    ldrexd    r0, [r12]

- add $r3 to $r1 store result into $r1, carry flag could be set
- adding 4 least sginificant bytes of 'long long'

   0x00000048 <+20>:    adds    r1, r1, r3

- adding with carry most siginificant parts of increment and counter

   0x0000004c <+24>:    adc    r0, r0, r2

- Store result back
  $r0 will to $r12 address
  $r1 will to $r12+4 address

   0x00000050 <+28>:    strexd    r4, r0, [r12]

Ldrd/strd
---------

Issue boils down to the fact that ldrexd/ldrd and strexd/strd instrucitons
in big endian mode will do byteswap only within 4 bytes boundary, but it will
not swap registers numbers itself. I.e ldrexd rN, <addr> puts swapped 4 bytes
at <addr> address into rN, and it puts swapped 4 bytes at <addr+4> into rN+1

Compiler?
---------

One may thing whether we have compiler bug here. And after all, what is the
meaning of %0 vs %H0? I asked my local gcc expert for the help. Here what
I got. Please look at

https://github.com/mirrors/gcc/blob/master/gcc/config/arm/arm.c

and search for >'Q', 'R' and 'H'<

according to this page '%HN' is always reg+1 operand regardless whether it is
big endian system or not. It is opposite to 'Q'.

Better fix
----------

In fact now I believe correct fix should use 'Q' for least siginificant 4
bytes, 'R' for most siginificant 4 bytes and 'H' is used in load/store
instructions.

It should look something like this:

static inline void my_atomic64_add_fixed1(u64 i, atomic64_t *v)
{
    u64 result;
    unsigned long tmp;

    __asm__ __volatile__("@ atomic64_add\n"
"1:    ldrexd    %0, %H0, [%3]\n"
"    adds    %Q0, %Q0, %Q4\n"
"    adc    %R0, %R0, %R4\n"
"    strexd    %1, %0, %H0, [%3]\n"
"    teq    %1, #0\n"
"    bne    1b"
    : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
    : "r" (&v->counter), "r" (i)
    : "cc");
}

It is better that my original fix because it does not have conditional
compilation. I've tested it in the module, it works on both LE and BE kernel
variants.

Thanks,
Victor

^ permalink raw reply	[flat|nested] 13+ messages in thread

* updates for be8 patch series
  2013-07-23 18:30       ` Ben Dooks
  2013-07-24  1:06         ` Victor Kamensky
@ 2013-07-24  7:31         ` Victor Kamensky
  2013-07-24 21:24           ` Victor Kamensky
  1 sibling, 1 reply; 13+ messages in thread
From: Victor Kamensky @ 2013-07-24  7:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ben,

Please see inline

On 23 July 2013 11:30, Ben Dooks <ben.dooks@codethink.co.uk> wrote:
> On 23/07/13 19:03, Victor Kamensky wrote:
>>
>> Hi Ben,
>>
>> Sorry for multiple post. I am new to this, and I could not get mailing
>> list to accept my original email.
>> For sake of others so people can see discussed patch, here it goes
>> again with patch file inlined
>> now. Hope it will this time
>>
>> Wrt BE8 little endian instructions you will need to fix couple more
>> places:
>>      ftrace arch/arm/kernel/ftrace.c
>>      kprobes arch/arm/kernel/kprobes.c
>> Also in big endian mode atomic64_xxx function will need fixes, otherwise
>> perf
>> counters will be truncated on max of 32bit values
>>      atomic64 arch/arm/include/asm/atomic.h
>>
>> I've attached board independent (core) patch from my work that made
>> Pandaboard ES
>> kernel run in BE mode. You can check my version of fixes for above issues
>> in the
>> patch. Look for corresponding files changes. Please rework them
>> according to style
>> and rules of your patch series. Note the patch itself contains fixes
>> for few other
>> issues that already fixed in your series. I'll cross check and compare
>> individual
>> areas latter. I think you may find attached patch interesting as well,
>> it was developed
>> independent from yours but matches its very well.
>>
>> Wrt of testing: ftrace was tested on Pandaboard ES, krprobes changes were
>> tested
>> with SystemTap at some point of patch version on Pandaboard ES (not
>> thumb mode though),
>> also it may deteriorated while ported across different kernel version,
>> good idea to
>> rested last patch. atomic64 were tested on Pandaboard ES and Marvell
>> Armada XP.
>>
>> Thanks,
>> Victor
>
>
> First notes, please always format your emails (where possible) to
> under 80 characters per line so that they do not get mangled when
> viewed on limited screens (for example, I often read email via a
> 24x80 terminal)

Yes, I understand. Sorry about this. Please bear with me a bit. I am new to
Linaro .. just my second week started. Still learning tools used here, and
still struggling with default email client.

> Second note, when producing patches, it is better to try and split
> them into a series of patches to reach a goal than just one large
> patch. It keeps a lot of useful information (see patch comments)
> and it means that if there is a problem, then bits can be lifted
> out to check.

Yes, I understand. Indeed multiple patches is best from. But I did not really
mean try to contribute these patches, but rather throw patch over the wall for
reference.

I believe largely your series covers the same things, in more accurate way,
and should be continued. I plan to review your series and will try to
integrate it and run on Pandaboard ES. Note for this testing I would use big
patch of TI OMAP4 drivers to be endian neutral. It will take me several days.
If I find anything I will let you know accross these days.

Note my original experiment was done against TI 3.1 kernel and after that
was carried forward without great care, so it may explain few descrepencies
below and my blunder with ftrace comment.

>
>
>> Index: linux-linaro-tracking/arch/arm/Makefile
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/Makefile
>> +++ linux-linaro-tracking/arch/arm/Makefile
>> @@ -16,6 +16,7 @@ LDFLAGS        :=
>>   LDFLAGS_vmlinux    :=-p --no-undefined -X
>>   ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
>>   LDFLAGS_vmlinux    += --be8
>> +KBUILD_LDFLAGS_MODULE += --be8
>>   endif
>>
>>   OBJCOPYFLAGS    :=-O binary -R .comment -S
>> @@ -43,12 +44,19 @@ ifeq ($(CONFIG_CC_STACKPROTECTOR),y)
>>   KBUILD_CFLAGS    +=-fstack-protector
>>   endif
>>
>> +# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are
>> +# places where CFLAGS alone are used (cmd_record_mcount). And it has to
>> +# be in CPPFLAGS because it affects what macros (__ARMEB__,
>> __BYTE_ORDER__,
>> +# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be
>> +# good compromise
>>   ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
>>   KBUILD_CPPFLAGS    += -mbig-endian
>> +KBUILD_CFLAGS    += -mbig-endian
>>   AS        += -EB
>>   LD        += -EB
>>   else
>>   KBUILD_CPPFLAGS    += -mlittle-endian
>> +KBUILD_CFLAGS    += -mlittle-endian
>>   AS        += -EL
>>   LD        += -EL
>>   endif
>
>
> See the comment below.
>
>
>> Index: linux-linaro-tracking/arch/arm/boot/compressed/Makefile
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/boot/compressed/Makefile
>> +++ linux-linaro-tracking/arch/arm/boot/compressed/Makefile
>> @@ -138,6 +138,24 @@ endif
>>   ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
>>   LDFLAGS_vmlinux += --be8
>>   endif
>> +
>> +# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are
>> +# places where CFLAGS alone are used (cmd_record_mcount). And it has to
>> +# be in CPPFLAGS because it affects what macros (__ARMEB__,
>> __BYTE_ORDER__,
>> +# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be
>> +# good compromise
>> +ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
>> +KBUILD_CPPFLAGS    += -mbig-endian
>> +KBUILD_CFLAGS    += -mbig-endian
>> +AS        += -EB
>> +LD        += -EB
>> +else
>> +KBUILD_CPPFLAGS    += -mlittle-endian
>> +KBUILD_CFLAGS    += -mlittle-endian
>> +AS        += -EL
>> +LD        += -EL
>> +endif
>> +
>>   # ?
>>   LDFLAGS_vmlinux += -p
>>   # Report unresolved symbol references
>
>
> I've not been bitten by this one. Is there anyone else who can comment
> on KBUILD_CFLAGS vs KBUILD_CPPFLAGS?
>
> I had assumed that arch/arm/boot/compressed/Makefile would have
> inherited the flags from the arch/arm/Makefile. Someone else would
> be better qualified to comment on this.

Let's ignore this. I don't think it is relevant now. There was probably some
odd situation in old kernel.

>
>> Index: linux-linaro-tracking/arch/arm/boot/compressed/head.S
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/boot/compressed/head.S
>> +++ linux-linaro-tracking/arch/arm/boot/compressed/head.S
>> @@ -141,6 +141,12 @@ start:
>>   #endif
>>           mov    r7, r1            @ save architecture ID
>>           mov    r8, r2            @ save atags pointer
>> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
>> +                /*
>> +                 * Switch to bigendian mode
>> +                 */
>> +                setend  be
>> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>>
>>   #ifndef __ARM_ARCH_2__
>>           /*
>> Index: linux-linaro-tracking/arch/arm/include/asm/setup.h
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/include/asm/setup.h
>> +++ linux-linaro-tracking/arch/arm/include/asm/setup.h
>> @@ -53,4 +53,8 @@ extern int arm_add_memory(phys_addr_t st
>>   extern void early_print(const char *str, ...);
>>   extern void dump_machine_table(void);
>>
>> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
>> +void byteswap_tags(struct tag *t);
>> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>> +
>>   #endif
>
>
> You could have done
>
> #ifdef CONFIG_LITTLE_ENDIAN_LOADER
> void byteswap_tags(struct tag *t);
> #else
> static inline void byteswap_tags(struct tag *t) { }
> #endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>
>
>
>> Index: linux-linaro-tracking/arch/arm/kernel/Makefile
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/kernel/Makefile
>> +++ linux-linaro-tracking/arch/arm/kernel/Makefile
>> @@ -22,6 +22,7 @@ obj-y        := elf.o entry-armv.o entry-commo
>>   obj-$(CONFIG_ATAGS)        += atags_parse.o
>>   obj-$(CONFIG_ATAGS_PROC)    += atags_proc.o
>>   obj-$(CONFIG_DEPRECATED_PARAM_STRUCT) += atags_compat.o
>> +obj-$(CONFIG_LITTLE_ENDIAN_LOADER) += atags_byteswap.o
>>
>>   obj-$(CONFIG_OC_ETM)        += etm.o
>>   obj-$(CONFIG_CPU_IDLE)        += cpuidle.o
>> Index: linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c
>> ===================================================================
>> --- /dev/null
>> +++ linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c
>> @@ -0,0 +1,119 @@
>> +#include<linux/slab.h>
>> +#include<linux/proc_fs.h>
>> +#include<linux/swab.h>
>> +#include<asm/setup.h>
>> +#include<asm/types.h>
>> +#include<asm/page.h>
>> +
>> +#define byteswap32(x) (x) = __swab32((x))
>> +#define byteswap16(x) (x) = __swab16((x))
>> +
>> +static void __init byteswap_tag_core(struct tag *tag)
>> +{
>> +  byteswap32(tag->u.core.flags);
>> +  byteswap32(tag->u.core.pagesize);
>> +  byteswap32(tag->u.core.rootdev);
>> +}
>> +
>> +static void __init byteswap_tag_mem32(struct tag *tag)
>> +{
>> +  byteswap32(tag->u.mem.size);
>> +  byteswap32(tag->u.mem.start);
>> +}
>> +
>> +static void __init byteswap_tag_videotext(struct tag *tag)
>> +{
>> +  byteswap16(tag->u.videotext.video_page);
>> +  byteswap16(tag->u.videotext.video_ega_bx);
>> +  byteswap16(tag->u.videotext.video_points);
>> +}
>> +
>> +static void __init byteswap_tag_ramdisk(struct tag *tag)
>> +{
>> +  byteswap32(tag->u.ramdisk.flags);
>> +  byteswap32(tag->u.ramdisk.size);
>> +  byteswap32(tag->u.ramdisk.start);
>> +}
>> +
>> +static void __init byteswap_tag_initrd(struct tag *tag)
>> +{
>> +  byteswap32(tag->u.initrd.start);
>> +  byteswap32(tag->u.initrd.size);
>> +}
>> +
>> +static void __init byteswap_tag_serialnr(struct tag *tag)
>> +{
>> +  byteswap32(tag->u.serialnr.low);
>> +  byteswap32(tag->u.serialnr.high);
>> +}
>> +
>> +static void __init byteswap_tag_revision(struct tag *tag)
>> +{
>> +    byteswap32(tag->u.revision.rev);
>> +}
>> +
>> +static void __init byteswap_tag_videolfb(struct tag *tag)
>> +{
>> +    byteswap16(tag->u.videolfb.lfb_width);
>> +    byteswap16(tag->u.videolfb.lfb_height);
>> +    byteswap16(tag->u.videolfb.lfb_depth);
>> +    byteswap16(tag->u.videolfb.lfb_linelength);
>> +    byteswap32(tag->u.videolfb.lfb_base);
>> +    byteswap32(tag->u.videolfb.lfb_size);
>> +}
>> +
>> +static void __init byteswap_tag_acorn(struct tag *tag)
>> +{
>> +    byteswap32(tag->u.acorn.memc_control_reg);
>> +    byteswap32(tag->u.acorn.vram_pages);
>> +}
>> +
>> +static void __init byteswap_tag_memclk(struct tag *tag)
>> +{
>> +    byteswap32(tag->u.memclk.fmemclk);
>> +}
>> +
>> +void __init byteswap_tags(struct tag *t)
>> +{
>> +  for (; t->hdr.size; t = tag_next(t)) {
>> +    byteswap32(t->hdr.size);
>> +    byteswap32(t->hdr.tag);
>> +    switch(t->hdr.tag) {
>> +    case ATAG_CORE:
>> +      byteswap_tag_core(t);
>> +      break;
>> +    case ATAG_MEM:
>> +      byteswap_tag_mem32(t);
>> +      break;
>> +    case ATAG_VIDEOTEXT:
>> +      byteswap_tag_videotext(t);
>> +      break;
>> +    case ATAG_RAMDISK:
>> +      byteswap_tag_ramdisk(t);
>> +      break;
>> +    case ATAG_INITRD2:
>> +      byteswap_tag_initrd(t);
>> +      break;
>> +    case ATAG_SERIAL:
>> +      byteswap_tag_serialnr(t);
>> +      break;
>> +    case ATAG_REVISION:
>> +      byteswap_tag_revision(t);
>> +      break;
>> +    case ATAG_VIDEOLFB:
>> +      byteswap_tag_videolfb(t);
>> +      break;
>> +    case ATAG_CMDLINE:
>> +      break;
>> +    case ATAG_ACORN:
>> +      byteswap_tag_acorn(t);
>> +      break;
>> +    case ATAG_MEMCLK:
>> +      byteswap_tag_memclk(t);
>> +      break;
>> +    default:
>> +      /* Need to complain? */
>> +      break;
>> +    }
>> +  }
>> +}
>
>
> I think swapping on read is a better idea, as it leaves
> these in their original format.
>
> PS, looks like you either have a formatting issue with
> your email client or your code. Did this pass checkpatch?
>
>
>> Index: linux-linaro-tracking/arch/arm/kernel/head-common.S
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/kernel/head-common.S
>> +++ linux-linaro-tracking/arch/arm/kernel/head-common.S
>> @@ -48,6 +48,9 @@ __vet_atags:
>>       bne    1f
>>
>>       ldr    r5, [r2, #0]
>> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
>> +        rev     r5, r5
>> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>>   #ifdef CONFIG_OF_FLATTREE
>>       ldr    r6, =OF_DT_MAGIC        @ is it a DTB?
>>       cmp    r5, r6
>> @@ -57,6 +60,9 @@ __vet_atags:
>>       cmpne    r5, #ATAG_CORE_SIZE_EMPTY
>>       bne    1f
>>       ldr    r5, [r2, #4]
>> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
>> +        rev     r5, r5
>> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>>       ldr    r6, =ATAG_CORE
>>       cmp    r5, r6
>>       bne    1f
>
>
> bit-untidy. also, does OF_DT_MAGIC get changed by the
> endian-ness, or is that somewhere else in the head code?
>
>> Index: linux-linaro-tracking/arch/arm/kernel/module.c
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/kernel/module.c
>> +++ linux-linaro-tracking/arch/arm/kernel/module.c
>> @@ -45,6 +45,65 @@ void *module_alloc(unsigned long size)
>>   }
>>   #endif
>>
>> +/*
>> + * For relocations in exectuable sections if we configured with
>> + * CONFIG_CPU_ENDIAN_BE8 we assume that code is in little endian byteswap
>> + * form already and we need to byteswap value when we read and when we
>> + * write. If relocaton R_ARM_ABS32 type we assume that it is relocation
>> for
>> + * data piece and we do not perform byteswaps when such relocation
>> applied.
>> + * This herustic based approach may break in future.
>> + *
>> + * It seems that more correct way to handle be8 in kernel module is
>> + * to build module without --be8 option, but perform instruction byteswap
>> + * when module sections are loaded, after all relocation are applied.
>> + * In order to do such byteswap we need to read all mappings symbols ($a,
>> + * $d, $t), sort them, and apply byteswaps for $a and $t entries.
>> + */
>> +static inline u32 read_section_u32(Elf32_Shdr *dstsec, unsigned long loc)
>> +{
>> +       u32 retval = *(u32 *)loc;
>> +
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        if (dstsec->sh_flags&  SHF_EXECINSTR) {
>>
>> +                retval = __swab32(retval);
>> +        }
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +        return retval;
>> +}
>> +
>> +static inline void write_section_u32(Elf32_Shdr *dstsec, unsigned
>> long loc, u32 value)
>> +{
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        if (dstsec->sh_flags&  SHF_EXECINSTR) {
>>
>> +                value = __swab32(value);
>> +        }
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +        *(u32 *) loc = value;
>> +}
>> +
>> +#ifdef CONFIG_THUMB2_KERNEL
>> +static inline u16 read_section_u16(Elf32_Shdr *dstsec, unsigned long loc)
>> +{
>> +        u16 retval = *(u16 *) loc;
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        if (dstsec->sh_flags&  SHF_EXECINSTR) {
>>
>> +                retval = __swab16(retval);
>> +        }
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +        return retval;
>> +}
>> +
>> +static inline void write_section_u16(Elf32_Shdr *dstsec, unsigned
>> long loc, u16 value)
>> +{
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        if (dstsec->sh_flags&  SHF_EXECINSTR) {
>>
>> +                value = __swab16(value);
>> +        }
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +        *(u16 *) loc = value;
>> +}
>> +#endif /* CONFIG_THUMB2_KERNEL */
>> +
>>   int
>>   apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int
>> symindex,
>>              unsigned int relindex, struct module *module)
>> @@ -60,6 +119,7 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>>           Elf32_Sym *sym;
>>           const char *symname;
>>           s32 offset;
>> +                u32 loc_u32;
>>   #ifdef CONFIG_THUMB2_KERNEL
>>           u32 upper, lower, sign, j1, j2;
>>   #endif
>> @@ -95,7 +155,8 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>>           case R_ARM_PC24:
>>           case R_ARM_CALL:
>>           case R_ARM_JUMP24:
>> -            offset = (*(u32 *)loc&  0x00ffffff)<<  2;
>>
>> +                        loc_u32 = read_section_u32(dstsec, loc);
>> +            offset = (loc_u32&  0x00ffffff)<<  2;
>>               if (offset&  0x02000000)
>>
>>                   offset -= 0x04000000;
>>
>> @@ -112,27 +173,33 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>>
>>               offset>>= 2;
>>
>> -            *(u32 *)loc&= 0xff000000;
>>
>> -            *(u32 *)loc |= offset&  0x00ffffff;
>> +            loc_u32&= 0xff000000;
>> +            loc_u32 |= offset&  0x00ffffff;
>>
>> +                        write_section_u32(dstsec, loc, loc_u32);
>>               break;
>>
>> -           case R_ARM_V4BX:
>> +                case R_ARM_V4BX:
>> +                        loc_u32 = read_section_u32(dstsec, loc);
>>                  /* Preserve Rm and the condition code. Alter
>>               * other bits to re-code instruction as
>>               * MOV PC,Rm.
>>               */
>> -               *(u32 *)loc&= 0xf000000f;
>>
>> -               *(u32 *)loc |= 0x01a0f000;
>> +               loc_u32&= 0xf000000f;
>>
>> +               loc_u32 |= 0x01a0f000;
>> +                       write_section_u32(dstsec, loc, loc_u32);
>>                  break;
>>
>>           case R_ARM_PREL31:
>> -            offset = *(u32 *)loc + sym->st_value - loc;
>> -            *(u32 *)loc = offset&  0x7fffffff;
>>
>> +                        loc_u32 = read_section_u32(dstsec, loc);
>> +            offset = loc_u32 + sym->st_value - loc;
>> +            loc_u32 = offset&  0x7fffffff;
>>
>> +                        write_section_u32(dstsec, loc, loc_u32);
>>               break;
>>
>>           case R_ARM_MOVW_ABS_NC:
>>           case R_ARM_MOVT_ABS:
>> -            offset = *(u32 *)loc;
>> +                        loc_u32 = read_section_u32(dstsec, loc);
>> +            offset = loc_u32;
>>               offset = ((offset&  0xf0000)>>  4) | (offset&  0xfff);
>>
>>               offset = (offset ^ 0x8000) - 0x8000;
>>
>> @@ -140,16 +207,17 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>>               if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS)
>>                   offset>>= 16;
>>
>> -            *(u32 *)loc&= 0xfff0f000;
>>
>> -            *(u32 *)loc |= ((offset&  0xf000)<<  4) |
>> -                    (offset&  0x0fff);
>> +            loc_u32&= 0xfff0f000;
>> +            loc_u32 |= ((offset&  0xf000)<<  4) |
>> +                                   (offset&  0x0fff);
>>
>> +                        write_section_u32(dstsec, loc, loc_u32);
>>               break;
>>
>>   #ifdef CONFIG_THUMB2_KERNEL
>>           case R_ARM_THM_CALL:
>>           case R_ARM_THM_JUMP24:
>> -            upper = *(u16 *)loc;
>> -            lower = *(u16 *)(loc + 2);
>> +                        upper = read_section_u16(dstsec, loc);
>> +            lower = read_section_u16(dstsec, loc + 2);
>>
>>               /*
>>                * 25 bit signed address range (Thumb-2 BL and B.W
>> @@ -198,17 +266,19 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>>               sign = (offset>>  24)&  1;
>>               j1 = sign ^ (~(offset>>  23)&  1);
>>               j2 = sign ^ (~(offset>>  22)&  1);
>> -            *(u16 *)loc = (u16)((upper&  0xf800) | (sign<<  10) |
>> -                        ((offset>>  12)&  0x03ff));
>> -            *(u16 *)(loc + 2) = (u16)((lower&  0xd000) |
>>
>> -                          (j1<<  13) | (j2<<  11) |
>> -                          ((offset>>  1)&  0x07ff));
>> +                        write_section_u16(dstsec, loc,
>> +                                          (u16)((upper&  0xf800) |
>> (sign<<  10) |
>> +                                                ((offset>>  12)&
>> 0x03ff)));
>>
>> +                        write_section_u16(dstsec, loc + 2,
>> +                                          (u16)((lower&  0xd000) |
>>
>> +                                                (j1<<  13) | (j2<<  11) |
>> +                                                ((offset>>  1)&
>> 0x07ff)));
>>
>>               break;
>>
>>           case R_ARM_THM_MOVW_ABS_NC:
>>           case R_ARM_THM_MOVT_ABS:
>> -            upper = *(u16 *)loc;
>> -            lower = *(u16 *)(loc + 2);
>> +                        upper = read_section_u16(dstsec, loc);
>> +            lower = read_section_u16(dstsec, loc + 2);
>>
>>               /*
>>                * MOVT/MOVW instructions encoding in Thumb-2:
>> @@ -229,12 +299,14 @@ apply_relocate(Elf32_Shdr *sechdrs, cons
>>               if (ELF32_R_TYPE(rel->r_info) == R_ARM_THM_MOVT_ABS)
>>                   offset>>= 16;
>>
>> -            *(u16 *)loc = (u16)((upper&  0xfbf0) |
>> -                        ((offset&  0xf000)>>  12) |
>> -                        ((offset&  0x0800)>>  1));
>>
>> -            *(u16 *)(loc + 2) = (u16)((lower&  0x8f00) |
>> -                          ((offset&  0x0700)<<  4) |
>> -                          (offset&  0x00ff));
>>
>> +                        write_section_u16(dstsec, loc,
>> +                                          (u16)((upper&  0xfbf0) |
>> +                                                ((offset&  0xf000)>>  12)
>> |
>> +                                                ((offset&  0x0800)>>
>> 1)));
>>
>> +                        write_section_u16(dstsec, loc + 2,
>> +                                          (u16)((lower&  0x8f00) |
>> +                                                ((offset&  0x0700)<<  4)
>> |
>> +                                                (offset&  0x00ff)));
>>               break;
>>   #endif
>>
>
> I'm still not sure about this, I think the use of <asm/opcodes.h>
> is better here. Not sure if there's much point in checking the flags
> on the sections as I think the relocations pretty much define which
> endian they will be in.

Wrt asm/opcodes.h, yes, agree it should be used now.

I'll need to check why I used section attributes to guide byteswaps. I think it
was done for a reason. We tested it with bigger system on top with quite a bit
of different KMLs, I think I run into problem with some of them. Will need dig
back. As my comment says ideally it would be better to use $a, $d, $t, symbols;
that is what ld does with --be8 option, but it maybe too hard as
starting point.

>> Index: linux-linaro-tracking/arch/arm/mm/Kconfig
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/mm/Kconfig
>> +++ linux-linaro-tracking/arch/arm/mm/Kconfig
>> @@ -695,6 +695,15 @@ config CPU_ENDIAN_BE32
>>       help
>>         Support for the BE-32 (big-endian) mode on pre-ARMv6 processors.
>>
>> +config LITTLE_ENDIAN_LOADER
>> +        bool
>> +        depends on CPU_BIG_ENDIAN
>> +        default y
>> +        help
>> +          If Linux should operate in big-endian mode, but bootloader
>> (i.e u-boot)
>> +          is still in little endian mode and passes boot information in
>> little
>> +          endian form set this flag
>> +
>>   config CPU_HIGH_VECTOR
>>       depends on !MMU&&  CPU_CP15&&  !CPU_ARM740T
>>
>>       bool "Select the High exception vector"
>
>
>
>> Index: linux-linaro-tracking/arch/arm/mm/alignment.c
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/mm/alignment.c
>> +++ linux-linaro-tracking/arch/arm/mm/alignment.c
>> @@ -792,6 +792,10 @@ do_alignment(unsigned long addr, unsigne
>>
>>       regs->ARM_pc += isize;
>>
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        instr = __swab32(instr); /* little endian instruction */
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +
>>       switch (CODING_BITS(instr)) {
>>       case 0x00000000:    /* 3.13.4 load/store instruction extensions */
>>           if (LDSTHD_I_BIT(instr))
>
>
> See comments on <asm/opcodes.h>
>
>
>> Index: linux-linaro-tracking/arch/arm/kernel/head.S
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/kernel/head.S
>> +++ linux-linaro-tracking/arch/arm/kernel/head.S
>> @@ -89,6 +89,12 @@ ENTRY(stext)
>>       @ ensure svc mode and all interrupts masked
>>       safe_svcmode_maskall r9
>>
>> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
>> +    /*
>> +    * Switch to bigendian mode
>> +    */
>> +    setend    be
>> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>>       mrc    p15, 0, r9, c0, c0        @ get processor id
>>       bl    __lookup_processor_type        @ r5=procinfo r9=cpuid
>>       movs    r10, r5                @ invalid processor (r5=0)?
>> @@ -356,6 +362,12 @@ ENTRY(secondary_startup)
>>   #endif
>>       safe_svcmode_maskall r9
>>
>> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
>> +    /*
>> +    * Switch to bigendian mode
>> +    */
>> +    setend    be
>> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>>       mrc    p15, 0, r9, c0, c0        @ get processor id
>>       bl    __lookup_processor_type
>>       movs    r10, r5                @ invalid processor?
>> @@ -427,6 +439,9 @@ __enable_mmu:
>>   #ifdef CONFIG_CPU_ICACHE_DISABLE
>>       bic    r0, r0, #CR_I
>>   #endif
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +    orr    r0, r0, #CR_EE    @ big-endian page tables
>> +#endif
>>   #ifdef CONFIG_ARM_LPAE
>>       mov    r5, #0
>>       mcrr    p15, 0, r4, r5, c2        @ load TTBR0
>> @@ -592,8 +607,14 @@ __fixup_a_pv_table:
>>       b    2f
>>   1:    add     r7, r3
>>       ldrh    ip, [r7, #2]
>> +#ifdef __ARMEB__
>> +        rev     ip, ip   @ byteswap instruction
>> +#endif /* __ARMEB__ */
>>       and    ip, 0x8f00
>>       orr    ip, r6    @ mask in offset bits 31-24
>> +#ifdef __ARMEB__
>> +        rev     ip, ip   @ byteswap instruction
>> +#endif /* __ARMEB__ */
>>       strh    ip, [r7, #2]
>>   2:    cmp    r4, r5
>>       ldrcc    r7, [r4], #4    @ use branch for delay slot
>> @@ -602,8 +623,14 @@ __fixup_a_pv_table:
>>   #else
>>       b    2f
>>   1:    ldr    ip, [r7, r3]
>> +#ifdef __ARMEB__
>> +        rev     ip, ip   @ byteswap instruction
>> +#endif /* __ARMEB__ */
>>       bic    ip, ip, #0x000000ff
>>       orr    ip, ip, r6    @ mask in offset bits 31-24
>> +#ifdef __ARMEB__
>> +        rev     ip, ip   @ byteswap instruction
>> +#endif /* __ARMEB__ */
>>       str    ip, [r7, r3]
>>   2:    cmp    r4, r5
>>       ldrcc    r7, [r4], #4    @ use branch for delay slot
>
>
> Yeah, prefer my macros for doing this.
>
>
>> Index: linux-linaro-tracking/arch/arm/kernel/kprobes.c
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/kernel/kprobes.c
>> +++ linux-linaro-tracking/arch/arm/kernel/kprobes.c
>> @@ -66,14 +66,24 @@ int __kprobes arch_prepare_kprobe(struct
>>       if (is_wide_instruction(insn)) {
>>           insn<<= 16;
>>           insn |= ((u16 *)addr)[1];
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +                insn = __swab32(insn); /* little endian instruction */
>> +#endif
>>           decode_insn = thumb32_kprobe_decode_insn;
>> -    } else
>> -        decode_insn = thumb16_kprobe_decode_insn;
>> +    } else {
>> +                decode_insn = thumb16_kprobe_decode_insn;
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +                insn = __swab16(insn); /* little endian instruction */
>> +#endif
>> +        }
>>   #else /* !CONFIG_THUMB2_KERNEL */
>>       thumb = false;
>>       if (addr&  0x3)
>>
>>           return -EINVAL;
>>       insn = *p->addr;
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        insn = __swab32(insn); /* little endian instruction */
>> +#endif
>>       decode_insn = arm_kprobe_decode_insn;
>>   #endif
>>
>> @@ -89,7 +99,11 @@ int __kprobes arch_prepare_kprobe(struct
>>           if (!p->ainsn.insn)
>>               return -ENOMEM;
>>           for (is = 0; is<  MAX_INSN_SIZE; ++is)
>> +#ifndef CONFIG_CPU_ENDIAN_BE8
>>               p->ainsn.insn[is] = tmp_insn[is];
>> +#else
>> +                        p->ainsn.insn[is] = __swab32(tmp_insn[is]);
>> /* little endian instruction */
>> +#endif
>>           flush_insns(p->ainsn.insn,
>>                   sizeof(p->ainsn.insn[0]) * MAX_INSN_SIZE);
>>           p->ainsn.insn_fn = (kprobe_insn_fn_t *)
>> @@ -113,10 +127,17 @@ void __kprobes arch_arm_kprobe(struct kp
>>           /* Remove any Thumb flag */
>>           addr = (void *)((uintptr_t)p->addr&  ~1);
>>
>>
>> -        if (is_wide_instruction(p->opcode))
>> +        if (is_wide_instruction(p->opcode)) {
>>               brkp = KPROBE_THUMB32_BREAKPOINT_INSTRUCTION;
>> -        else
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +                brkp = __swab32(brkp);
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +        } else {
>>               brkp = KPROBE_THUMB16_BREAKPOINT_INSTRUCTION;
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +                brkp = __swab16(brkp);
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +            }
>>       } else {
>>           kprobe_opcode_t insn = p->opcode;
>>
>> @@ -127,6 +148,10 @@ void __kprobes arch_arm_kprobe(struct kp
>>               brkp |= 0xe0000000;  /* Unconditional instruction */
>>           else
>>               brkp |= insn&  0xf0000000;  /* Copy condition from insn */
>>
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        brkp = __swab32(brkp);
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +
>>       }
>>
>>       patch_text(addr, brkp);
>> @@ -144,8 +169,12 @@ int __kprobes __arch_disarm_kprobe(void
>>   {
>>       struct kprobe *kp = p;
>>       void *addr = (void *)((uintptr_t)kp->addr&  ~1);
>>
>> +    kprobe_opcode_t insn = kp->opcode;
>>
>> -    __patch_text(addr, kp->opcode);
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        insn = __swab32(insn); /* little endian instruction */
>> +#endif
>> +    __patch_text(addr, insn);
>>
>>       return 0;
>>
>
> __patch_text already does swapping, so a little concerned that it
> is being done twice in some places.
>
>
>
>> Index: linux-linaro-tracking/arch/arm/kernel/traps.c
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/kernel/traps.c
>> +++ linux-linaro-tracking/arch/arm/kernel/traps.c
>> @@ -426,6 +426,10 @@ asmlinkage void __exception do_undefinst
>>           goto die_sig;
>>       }
>>
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        instr = __swab32(instr); /* little endian instruction */
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +
>>       if (call_undef_hook(regs, instr) == 0)
>>           return;
>
>
> already fixed.
>
>
>> Index: linux-linaro-tracking/arch/arm/kernel/ftrace.c
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/kernel/ftrace.c
>> +++ linux-linaro-tracking/arch/arm/kernel/ftrace.c
>> @@ -100,10 +100,18 @@ static int ftrace_modify_code(unsigned l
>>           if (probe_kernel_read(&replaced, (void *)pc, MCOUNT_INSN_SIZE))
>>               return -EFAULT;
>>
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        replaced = __swab32(replaced); /* little endian instruction */
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +
>>           if (replaced != old)
>>               return -EINVAL;
>>       }
>>
>> +#ifdef CONFIG_CPU_ENDIAN_BE8
>> +        new = __swab32(new); /* little endian instruction */
>> +#endif /* CONFIG_CPU_ENDIAN_BE8 */
>> +
>>       if (probe_kernel_write((void *)pc,&new, MCOUNT_INSN_SIZE))
>>           return -EPERM;
>
>
> Please see <asm/opcodes.h> for dealing with the op-codes like this.

Yes, agree. It is not needed any more. In kernel 3.1/3.2 there was no
asm/opcodes.h, __opcode_to_mem_arm, etc macros. So that was added. It is
incorrect now.

>
>> Index: linux-linaro-tracking/arch/arm/kernel/atags_parse.c
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/kernel/atags_parse.c
>> +++ linux-linaro-tracking/arch/arm/kernel/atags_parse.c
>> @@ -216,6 +216,10 @@ struct machine_desc * __init setup_machi
>>       if (tags->hdr.tag != ATAG_CORE)
>>           convert_to_tag_list(tags);
>>   #endif
>> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER
>> +    byteswap_tags(tags);
>> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */
>> +
>>       if (tags->hdr.tag != ATAG_CORE) {
>>           early_print("Warning: Neither atags nor dtb found\n");
>>           tags = (struct tag *)&default_tags;
>> Index: linux-linaro-tracking/arch/arm/kernel/sleep.S
>
>
> I really don't like this way, it makes it difficult to deal with
> if you export this data to user-space to then do things like md5sum
> it. Also, if you're going to kexec() a new kernel, would you have
> to re-swap it back before this?

Agree that swap in place could be problematic.
I will look at this, starting with patches you've done. Will get back to you.

>> Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h
>> ===================================================================
>> --- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h
>> +++ linux-linaro-tracking/arch/arm/include/asm/atomic.h
>> @@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a
>>
>>       __asm__ __volatile__("@ atomic64_add\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    adds    %0, %0, %4\n"
>>   "    adc    %H0, %H0, %H4\n"
>> +#else
>> +"    adds    %H0, %H0, %H4\n"
>> +"    adc    %0, %0, %4\n"
>> +#endif
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>>   "    bne    1b"
>> @@ -320,8 +325,13 @@ static inline u64 atomic64_add_return(u6
>>
>>       __asm__ __volatile__("@ atomic64_add_return\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    adds    %0, %0, %4\n"
>>   "    adc    %H0, %H0, %H4\n"
>> +#else
>> +"    adds    %H0, %H0, %H4\n"
>> +"    adc    %0, %0, %4\n"
>> +#endif
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>>   "    bne    1b"
>> @@ -341,8 +351,13 @@ static inline void atomic64_sub(u64 i, a
>>
>>       __asm__ __volatile__("@ atomic64_sub\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    subs    %0, %0, %4\n"
>>   "    sbc    %H0, %H0, %H4\n"
>> +#else
>> +"    subs    %H0, %H0, %H4\n"
>> +"    sbc    %0, %0, %4\n"
>> +#endif
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>>   "    bne    1b"
>> @@ -360,8 +375,13 @@ static inline u64 atomic64_sub_return(u6
>>
>>       __asm__ __volatile__("@ atomic64_sub_return\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    subs    %0, %0, %4\n"
>>   "    sbc    %H0, %H0, %H4\n"
>> +#else
>> +"    subs    %H0, %H0, %H4\n"
>> +"    sbc    %0, %0, %4\n"
>> +#endif
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>>   "    bne    1b"
>> @@ -428,9 +448,15 @@ static inline u64 atomic64_dec_if_positi
>>
>>       __asm__ __volatile__("@ atomic64_dec_if_positive\n"
>>   "1:    ldrexd    %0, %H0, [%3]\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    subs    %0, %0, #1\n"
>>   "    sbc    %H0, %H0, #0\n"
>>   "    teq    %H0, #0\n"
>> +#else
>> +"    subs    %H0, %H0, #1\n"
>> +"    sbc    %0, %0, #0\n"
>> +"    teq    %0, #0\n"
>> +#endif
>>   "    bmi    2f\n"
>>   "    strexd    %1, %0, %H0, [%3]\n"
>>   "    teq    %1, #0\n"
>> @@ -459,8 +485,13 @@ static inline int atomic64_add_unless(at
>>   "    teqeq    %H0, %H5\n"
>>   "    moveq    %1, #0\n"
>>   "    beq    2f\n"
>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>   "    adds    %0, %0, %6\n"
>>   "    adc    %H0, %H0, %H6\n"
>> +#else
>> +"    adds    %H0, %H0, %H6\n"
>> +"    adc    %0, %0, %6\n"
>> +#endif
>>   "    strexd    %2, %0, %H0, [%4]\n"
>>   "    teq    %2, #0\n"
>>   "    bne    1b\n"
>
>
> I still believe that you're doing the wrong thing here
> and that these can be left well enough alone. Please give
> a concrete piece of code that can show they're actually
> doing the wrong thing.

That one is addressed in separate email.

Thanks,
Victor

>
>
>
> --
> Ben Dooks                               http://www.codethink.co.uk/
> Senior Engineer                         Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 13+ messages in thread

* updates for be8 patch series
  2013-07-24  1:06         ` Victor Kamensky
@ 2013-07-24  9:36           ` Will Deacon
  2013-07-24 10:46           ` Ben Dooks
  1 sibling, 0 replies; 13+ messages in thread
From: Will Deacon @ 2013-07-24  9:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 24, 2013 at 02:06:05AM +0100, Victor Kamensky wrote:
> Hi Ben,
> 
> Let me split off atomic64 from other issues, so we have focused topic here.
> 
> Please see atomic64 issue test case below. Please try it in your setup. I've
> run on Pandaboard ES with my set of patches, but I believe it should be the
> same in your case.

[...]


> <snip>
> 
> >> Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h
> >> ===================================================================
> >> --- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h
> >> +++ linux-linaro-tracking/arch/arm/include/asm/atomic.h
> >> @@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a
> >>
> >>       __asm__ __volatile__("@ atomic64_add\n"
> >>   "1:    ldrexd    %0, %H0, [%3]\n"
> >> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
> >>   "    adds    %0, %0, %4\n"
> >>   "    adc    %H0, %H0, %H4\n"
> >> +#else
> >> +"    adds    %H0, %H0, %H4\n"
> >> +"    adc    %0, %0, %4\n"
> >> +#endif

I thought you could just use the %Q and %R output modifiers to get the
appropriate halves of the 64-bit operand, then you can avoid the #ifdef.

Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* updates for be8 patch series
  2013-07-24  1:06         ` Victor Kamensky
  2013-07-24  9:36           ` Will Deacon
@ 2013-07-24 10:46           ` Ben Dooks
  1 sibling, 0 replies; 13+ messages in thread
From: Ben Dooks @ 2013-07-24 10:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 24/07/13 02:06, Victor Kamensky wrote:
> Hi Ben,
>
> Let me split off atomic64 from other issues, so we have focused topic here.
>
> Please see atomic64 issue test case below. Please try it in your setup. I've
> run on Pandaboard ES with my set of patches, but I believe it should be the
> same in your case.
>
> <snip>
>
>>> Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h
>>> ===================================================================
>>> --- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h
>>> +++ linux-linaro-tracking/arch/arm/include/asm/atomic.h
>>> @@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a
>>>
>>>        __asm__ __volatile__("@ atomic64_add\n"
>>>    "1:    ldrexd    %0, %H0, [%3]\n"
>>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>>    "    adds    %0, %0, %4\n"
>>>    "    adc    %H0, %H0, %H4\n"
>>> +#else
>>> +"    adds    %H0, %H0, %H4\n"
>>> +"    adc    %0, %0, %4\n"
>>> +#endif
>>>    "    strexd    %1, %0, %H0, [%3]\n"
>>>    "    teq    %1, #0\n"
>>>    "    bne    1b"
>>> @@ -320,8 +325,13 @@ static inline u64 atomic64_add_return(u6
>>>
>>>        __asm__ __volatile__("@ atomic64_add_return\n"
>>>    "1:    ldrexd    %0, %H0, [%3]\n"
>>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>>    "    adds    %0, %0, %4\n"
>>>    "    adc    %H0, %H0, %H4\n"
>>> +#else
>>> +"    adds    %H0, %H0, %H4\n"
>>> +"    adc    %0, %0, %4\n"
>>> +#endif
>>>    "    strexd    %1, %0, %H0, [%3]\n"
>>>    "    teq    %1, #0\n"
>>>    "    bne    1b"
>>> @@ -341,8 +351,13 @@ static inline void atomic64_sub(u64 i, a
>>>
>>>        __asm__ __volatile__("@ atomic64_sub\n"
>>>    "1:    ldrexd    %0, %H0, [%3]\n"
>>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>>    "    subs    %0, %0, %4\n"
>>>    "    sbc    %H0, %H0, %H4\n"
>>> +#else
>>> +"    subs    %H0, %H0, %H4\n"
>>> +"    sbc    %0, %0, %4\n"
>>> +#endif
>>>    "    strexd    %1, %0, %H0, [%3]\n"
>>>    "    teq    %1, #0\n"
>>>    "    bne    1b"
>>> @@ -360,8 +375,13 @@ static inline u64 atomic64_sub_return(u6
>>>
>>>        __asm__ __volatile__("@ atomic64_sub_return\n"
>>>    "1:    ldrexd    %0, %H0, [%3]\n"
>>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>>    "    subs    %0, %0, %4\n"
>>>    "    sbc    %H0, %H0, %H4\n"
>>> +#else
>>> +"    subs    %H0, %H0, %H4\n"
>>> +"    sbc    %0, %0, %4\n"
>>> +#endif
>>>    "    strexd    %1, %0, %H0, [%3]\n"
>>>    "    teq    %1, #0\n"
>>>    "    bne    1b"
>>> @@ -428,9 +448,15 @@ static inline u64 atomic64_dec_if_positi
>>>
>>>        __asm__ __volatile__("@ atomic64_dec_if_positive\n"
>>>    "1:    ldrexd    %0, %H0, [%3]\n"
>>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>>    "    subs    %0, %0, #1\n"
>>>    "    sbc    %H0, %H0, #0\n"
>>>    "    teq    %H0, #0\n"
>>> +#else
>>> +"    subs    %H0, %H0, #1\n"
>>> +"    sbc    %0, %0, #0\n"
>>> +"    teq    %0, #0\n"
>>> +#endif
>>>    "    bmi    2f\n"
>>>    "    strexd    %1, %0, %H0, [%3]\n"
>>>    "    teq    %1, #0\n"
>>> @@ -459,8 +485,13 @@ static inline int atomic64_add_unless(at
>>>    "    teqeq    %H0, %H5\n"
>>>    "    moveq    %1, #0\n"
>>>    "    beq    2f\n"
>>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
>>>    "    adds    %0, %0, %6\n"
>>>    "    adc    %H0, %H0, %H6\n"
>>> +#else
>>> +"    adds    %H0, %H0, %H6\n"
>>> +"    adc    %0, %0, %6\n"
>>> +#endif
>>>    "    strexd    %2, %0, %H0, [%4]\n"
>>>    "    teq    %2, #0\n"
>>>    "    bne    1b\n"
>>
>>
>> I still believe that you're doing the wrong thing here
>> and that these can be left well enough alone. Please give
>> a concrete piece of code that can show they're actually
>> doing the wrong thing.
>
> Disable CONFIG_GENERIC_ATOMIC64
> -------------------------------
>
> Make sure that CONFIG_GENERIC_ATOMIC64 is not set. Otherwise atomic64_add from
> lib/atomic64.c will be used and that one works correctly but this case does
> not use atomic64_XXX inline functions from arch/arm/include/asm/atomic.h
>
> Personally with my current kernel close to 3.10 I failed to do that and manage
> to get live kernel, so to illustrate the issue I will use my own copy of
> inline atomic64_add function. But effectively it will show the same problem
> that I tried to report.

the kconfig has:
select GENERIC_ATOMIC64 if (CPU_V7M || CPU_V6 || !CPU_32v6K || !AEABI)

which means that it is possible that the generic one is being used.


> Compiler used
> -------------
>
> gcc-linaro-arm-linux-gnueabihf-4.7-2013.01-20130125_linux
>
> Test module source
> ------------------
>
> #include<linux/module.h>
> #include<linux/moduleparam.h>
> #include<linux/atomic.h>
>
> static inline void my_atomic64_add_original(u64 i, atomic64_t *v)
> {
>      u64 result;
>      unsigned long tmp;
>
>      __asm__ __volatile__("@ atomic64_add\n"
> "1:    ldrexd    %0, %H0, [%3]\n"
> "    adds    %0, %0, %4\n"
> "    adc    %H0, %H0, %H4\n"
> "    strexd    %1, %0, %H0, [%3]\n"
> "    teq    %1, #0\n"
> "    bne    1b"
>      : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
>      : "r" (&v->counter), "r" (i)
>      : "cc");
> }
>
> static inline void my_atomic64_add_fixed(u64 i, atomic64_t *v)
> {
>      u64 result;
>      unsigned long tmp;
>
>      __asm__ __volatile__("@ atomic64_add\n"
> "1:    ldrexd    %0, %H0, [%3]\n"
> #ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */
> "    adds    %0, %0, %4\n"
> "    adc    %H0, %H0, %H4\n"
> #else
> "    adds    %H0, %H0, %H4\n"
> "    adc    %0, %0, %4\n"
> #endif
> "    strexd    %1, %0, %H0, [%3]\n"
> "    teq    %1, #0\n"
> "    bne    1b"
>      : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
>      : "r" (&v->counter), "r" (i)
>      : "cc");
> }
>
>
> atomic64_t a = {0};
>
> #define INCREMENT 0x40000000
>
> void atomic_update_original (void)
> {
>      my_atomic64_add_original(INCREMENT,&a);
> }
>
> void atomic_update_fixed (void)
> {
>      my_atomic64_add_fixed(INCREMENT,&a);
> }
>
> void test_atomic64_original(void)
> {
>      int i;
>
>      printk("original atomic64_add\n");
>
>      atomic64_set(&a, 0);
>
>      printk("a = %llx\n", a.counter);
>
>      for (i = 0; i<  10; i++) {
>          atomic_update_original();
>          printk("i = %2d: a =  %llx\n", i, a.counter);
>      }
> }
>
> void test_atomic64_fixed(void)
> {
>      int i;
>
>      printk("fixed atomic64_add\n");
>
>      atomic64_set(&a, 0);
>
>      printk("a = %llx\n", a.counter);
>
>      for (i = 0; i<  10; i++) {
>          atomic_update_fixed();
>          printk("i = %2d: a =  %llx\n", i, a.counter);
>      }
> }
>
> int
> init_module (void)
> {
>      test_atomic64_original();
>      test_atomic64_fixed();
>      return 0;
> }
>
> void
> cleanup_module(void)
> {
> }
>
> MODULE_LICENSE("GPL");
>
>
> Test run
> --------
>
> Compile and run module, observe in original code counter is not incremented
> correctly.
>
> root at genericarmv7ab:~# insmod atomic64.ko
> [   73.041107] original atomic64_add
> [   73.044647] a = 0
> [   73.046691] i =  0: a =  40000000
> [   73.050659] i =  1: a =  80000000
> [   73.054199] i =  2: a =  c0000000
> [   73.057983] i =  3: a =  0
> [   73.060852] i =  4: a =  40000000
> [   73.064361] i =  5: a =  80000000
> [   73.067932] i =  6: a =  c0000000
> [   73.071441] i =  7: a =  0
> [   73.074310] i =  8: a =  40000000
> [   73.077850] i =  9: a =  80000000
> [   73.081390] fixed atomic64_add
> [   73.084625] a = 0
> [   73.086669] i =  0: a =  40000000
> [   73.090209] i =  1: a =  80000000
> [   73.093749] i =  2: a =  c0000000
> [   73.097290] i =  3: a =  100000000
> [   73.100891] i =  4: a =  140000000
> [   73.104522] i =  5: a =  180000000
> [   73.108154] i =  6: a =  1c0000000
> [   73.111755] i =  7: a =  200000000
> [   73.115386] i =  8: a =  240000000
> [   73.119018] i =  9: a =  280000000
>
>
> Disassemble of original code
> ----------------------------
>
> (gdb) disassemble atomic_update_original
> Dump of assembler code for function atomic_update_original:
>     0x00000000<+0>:    push    {r4}        ; (str r4, [sp, #-4]!)
>     0x00000004<+4>:    mov    r3, #1073741824    ; 0x40000000
>     0x00000008<+8>:    ldr    r12, [pc, #32]    ; 0x30
> <atomic_update_original+48>
>     0x0000000c<+12>:    mov    r2, #0
>     0x00000010<+16>:    ldrexd    r0, [r12]
>     0x00000014<+20>:    adds    r0, r0, r2
>     0x00000018<+24>:    adc    r1, r1, r3
>     0x0000001c<+28>:    strexd    r4, r0, [r12]
>     0x00000020<+32>:    teq    r4, #0
>     0x00000024<+36>:    bne    0x10<atomic_update_original+16>
>     0x00000028<+40>:    ldmfd    sp!, {r4}
>     0x0000002c<+44>:    bx    lr
>     0x00000030<+48>:    andeq    r0, r0, r0
> End of assembler dump.
>
> Disassemble of fixed code
> -------------------------
>
> (gdb) disassemble atomic_update_fixed
> Dump of assembler code for function atomic_update_fixed:
>     0x00000034<+0>:    push    {r4}        ; (str r4, [sp, #-4]!)
>     0x00000038<+4>:    mov    r3, #1073741824    ; 0x40000000
>     0x0000003c<+8>:    ldr    r12, [pc, #32]    ; 0x64<atomic_update_fixed+48>
>     0x00000040<+12>:    mov    r2, #0
>     0x00000044<+16>:    ldrexd    r0, [r12]
>     0x00000048<+20>:    adds    r1, r1, r3
>     0x0000004c<+24>:    adc    r0, r0, r2
>     0x00000050<+28>:    strexd    r4, r0, [r12]
>     0x00000054<+32>:    teq    r4, #0
>     0x00000058<+36>:    bne    0x44<atomic_update_fixed+16>
>     0x0000005c<+40>:    ldmfd    sp!, {r4}
>     0x00000060<+44>:    bx    lr
>     0x00000064<+48>:    andeq    r0, r0, r0
> End of assembler dump.
>
> Fixed code explanation
> ----------------------
>
> - $r2 has 4 most significant bytes of increement (0)
> - $r3 has 4 least significant bytes of increement (0x40000000)
>
> - Load from address at $r12 'long long' into r0 and r1
>    $r0 will get 4 most significant bytes (i.e 4 bytes at $r12)
>    $r1 will get 4 least significan bytes (i.e 4 bytes at $r12+4)
>    loads are big endian, so $r0 and $r0 will be read with proper be swap
>
>     0x00000044<+16>:    ldrexd    r0, [r12]
>
> - add $r3 to $r1 store result into $r1, carry flag could be set
> - adding 4 least sginificant bytes of 'long long'
>
>     0x00000048<+20>:    adds    r1, r1, r3
>
> - adding with carry most siginificant parts of increment and counter
>
>     0x0000004c<+24>:    adc    r0, r0, r2
>
> - Store result back
>    $r0 will to $r12 address
>    $r1 will to $r12+4 address
>
>     0x00000050<+28>:    strexd    r4, r0, [r12]
>
> Ldrd/strd
> ---------
>
> Issue boils down to the fact that ldrexd/ldrd and strexd/strd instrucitons
> in big endian mode will do byteswap only within 4 bytes boundary, but it will
> not swap registers numbers itself. I.e ldrexd rN,<addr>  puts swapped 4 bytes
> at<addr>  address into rN, and it puts swapped 4 bytes at<addr+4>  into rN+1

Looking at the ARM ARM, the following is LDRD.

if ConditionPassed() then
     EncodingSpecificOperations(); NullCheckIfThumbEE(15);
     address = if add then (Align(PC,4) + imm32) else (Align(PC,4) - imm32);
     if HaveLPAE() && address<2:0> == '000' then
         data = MemA[Address,8];
         if BigEndian() then
              R[t] = data<63:32>;
              R[t2] = data<31:0>;
         else
              R[t] = data<31:0>;
              R[t2] = data<63:32>;
     else
         R[t] = MemA[address,4];
         R[t2] = MemA[address+4,4];

I thought it always had R[t] as lowest and R[t2] as the highest.

> Compiler?
> ---------
>
> One may thing whether we have compiler bug here. And after all, what is the
> meaning of %0 vs %H0? I asked my local gcc expert for the help. Here what
> I got. Please look at
>
> https://github.com/mirrors/gcc/blob/master/gcc/config/arm/arm.c
>
> and search for>'Q', 'R' and 'H'<
>
> according to this page '%HN' is always reg+1 operand regardless whether it is
> big endian system or not. It is opposite to 'Q'.
>
> Better fix
> ----------
>
> In fact now I believe correct fix should use 'Q' for least siginificant 4
> bytes, 'R' for most siginificant 4 bytes and 'H' is used in load/store
> instructions.
>
> It should look something like this:
>
> static inline void my_atomic64_add_fixed1(u64 i, atomic64_t *v)
> {
>      u64 result;
>      unsigned long tmp;
>
>      __asm__ __volatile__("@ atomic64_add\n"
> "1:    ldrexd    %0, %H0, [%3]\n"
> "    adds    %Q0, %Q0, %Q4\n"
> "    adc    %R0, %R0, %R4\n"
> "    strexd    %1, %0, %H0, [%3]\n"
> "    teq    %1, #0\n"
> "    bne    1b"
>      : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
>      : "r" (&v->counter), "r" (i)
>      : "cc");
> }
>
> It is better that my original fix because it does not have conditional
> compilation. I've tested it in the module, it works on both LE and BE kernel
> variants.

If you want to make a patch with this in, I can add it to my series
ready to get merged.

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 13+ messages in thread

* updates for be8 patch series
  2013-07-24  7:31         ` Victor Kamensky
@ 2013-07-24 21:24           ` Victor Kamensky
  0 siblings, 0 replies; 13+ messages in thread
From: Victor Kamensky @ 2013-07-24 21:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ben,

> I believe largely your series covers the same things, in more accurate way,
> and should be continued. I plan to review your series and will try to
> integrate it and run on Pandaboard ES. Note for this testing I would use big
> patch of TI OMAP4 drivers to be endian neutral. It will take me several days.
> If I find anything I will let you know accross these days.

Here is what I got so far:

setend be under ARM_BE8 or CPU_ENDIAN_BE8 usage
-----------------------------------------------

In many places I see 'setend be' instruction under ARM_BE8

ARM_BE8(    setend    be )

and ARM_BE8 defined as

+/* Select code for any configuration running in BE8 mode */
+#ifdef CONFIG_CPU_ENDIAN_BE8
+#define ARM_BE8(code...) code
+#else
+#define ARM_BE8(code...)
+#endif

but strictly speaking you need to do that only if CPU_BE8_BOOT_LE config set.
I.e if loader is not in little endian mode, you do not need those commands. CPU
will be already in BE mode.

IMHO ARM_BE8 should be used only if it is really be8 specific issue. There
could be just big endian issues, those should be done under CPU_BIG_ENDIAN
config. There could be ARM CPU_BIG_ENDIAN system, but not CPU_ENDIAN_BE8.
I.e ARCH_IXP4xx is example of that.

For example 'ARM: pl01x debug code endian fix' patch IMHO should
use CPU_BIG_ENDIAN rather than CONFIG_CPU_ENDIAN_BE8 (as it is now through
ARM_BE8). I.e if the same serial console output function will run on
CPU_BIG_ENDIAN but not CONFIG_CPU_ENDIAN_BE8 it would need such byteswap. I
understand that it is a bit contrived example where code is really BSP
specific and this BSP would never have CONFIG_CPU_ENDIAN_BE8 missing, but I am
concerned that meaning of configs got blured if it is used like this.

Note you do have BE8_LE defined in latter patch ...

Few places miss atags swaps on the read
---------------------------------------

I've tried to compare your atags reading changes vs my byteswap in place code. I
agree that byte swap on reading is better. It seems to me that you've missed
few places when conversion should happen (unless I am missing something). Also I
think you  need to add atag16_to_cpu, and cpu_to_atag16 so short atag values
could be handled.

1) arch/arm/kernel/atags_parse.c <global>
            87 __tagtable(ATAG_VIDEOTEXT, parse_tag_videotext);

It needs to swap (only relevant fields listed)

struct tag_videotext {
    __u16        video_page;
    __u16        video_ega_bx;
    __u16        video_points;
};


2) arch/arm/kernel/atags_parse.c <global>
            140 __tagtable(ATAG_CMDLINE, parse_tag_cmdline);

It needs to swap cmdline, as far as I read the code it is a pointer ...
wondering what is going on in 64bit case ...

struct tag_cmdline {
    char    cmdline[1];    /* this is the minimum size */
};

3) arch/arm/mach-footbridge/common.c <global>
            51 __tagtable(ATAG_MEMCLK, parse_tag_memclk);

For consistency sake it is good idea to convert this too. It needs to swap
fmemclk on the read

struct tag_memclk {
    __u32 fmemclk;
};

4) arch/arm/mach-rpc/riscpc.c <global>
            65 __tagtable(ATAG_ACORN, parse_tag_acorn);

For consistency sake it is good idea to convert this too. It needs to swap
when reading the following fields

struct tag_acorn {
    __u32 memc_control_reg;
    __u32 vram_pages;
};

5) arch/arm/mm/init.c <global>
            68 __tagtable(ATAG_INITRD, parse_tag_initrd);
6) arch/arm/mm/init.c <global>
            77 __tagtable(ATAG_INITRD2, parse_tag_initrd2);

Need to swap

struct tag_initrd {
    __u32 start;    /* physical start address */
    __u32 size;    /* size of compressed ramdisk image in bytes */
};

Other than these notes, it looks very very good!

Thanks,
Victor

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-07-24 21:24 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAA3XUr1o0cOCybSR=pCUT-WYP4xus5VrES6Hyzo-Wy4tcutTvQ@mail.gmail.com>
2013-07-23 18:02 ` updates for be8 patch series Ben Dooks
     [not found] <CAA3XUr3h5C0oJVrN5iXsM=NFkLMz_Tssp28CysZfwMNS5KvO-Q@mail.gmail.com>
2013-07-23 17:40 ` Ben Dooks
2013-07-23 17:53   ` Victor Kamensky
2013-07-23 18:02     ` Ben Dooks
2013-07-23 18:03     ` Victor Kamensky
2013-07-23 18:30       ` Ben Dooks
2013-07-24  1:06         ` Victor Kamensky
2013-07-24  9:36           ` Will Deacon
2013-07-24 10:46           ` Ben Dooks
2013-07-24  7:31         ` Victor Kamensky
2013-07-24 21:24           ` Victor Kamensky
2013-07-22 16:33 Ben Dooks
2013-07-22 16:49 ` Ben Dooks

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).