* updates for be8 patch series [not found] <CAA3XUr1o0cOCybSR=pCUT-WYP4xus5VrES6Hyzo-Wy4tcutTvQ@mail.gmail.com> @ 2013-07-23 18:02 ` Ben Dooks 0 siblings, 0 replies; 13+ messages in thread From: Ben Dooks @ 2013-07-23 18:02 UTC (permalink / raw) To: linux-arm-kernel On 23/07/13 18:40, Victor Kamensky wrote: > [resend in plain-text mode] > > Hi Ben, > > Wrt BE8 little endian instructions you will need to fix couple more places: > ftrace arch/arm/kernel/ftrace.c > kprobes arch/arm/kernel/kprobes.c > Also in big endian mode atomic64_xxx function will need fixes, otherwise perf > counters will be truncated on max of 32bit values > atomic64 arch/arm/include/asm/atomic.h > > I've attached board independent (core) patch from my work that made > Pandaboard ES > kernel run in BE mode. You can check my version of fixes for above issues in the > patch. Look for corresponding files changes. Please rework them > according to style > and rules of your patch series. Note the patch itself contains fixes > for few other > issues that already fixed in your series. I'll cross check and compare > individual > areas latter. I think you may find attached patch interesting as well, > it was developed > independent from yours but matches it very well. > > Wrt of testing: ftrace was tested on Pandaboard ES, krprobes changes were tested > with SystemTap at some point of patch version on Pandaboard ES (not > thumb mode though), > also it may deteriorated while ported across different kernel version, > good idea to > rested last patch. atomic64 were tested on Pandaboard ES and Marvell Armada XP. Please wrap your emails to <80 characters, it was really difficult to read this. The atomic64 ops is an interesting one, I still think the in-kernel one is correct. Why are atomic64 being used on 32bit? If you are trying to decompose a 64bit into two 32bits, then that's probably the problem. The relocation code, the R_ARM instruction relocations should only be instructions and therefore can be safely swapped using the correct op-code accesses. The R_ARM_ABS32 IIRC is always in the same ordering as the CPU would access data. The kprobes stuff I am surprised if it works, due to the fact it calls patch_text() which already uses <asm/opcodes.h> to do the relevant byte-swapping. I think you only need to apply swaps to anything that it loads from memory. Which version of Linux did you patch? I think this is the necessary change to kprobes: diff --git a/arch/arm/kernel/kprobes.c b/arch/arm/kernel/kprobes.c index 170e9f3..c548356 100644 --- a/arch/arm/kernel/kprobes.c +++ b/arch/arm/kernel/kprobes.c @@ -26,6 +26,7 @@ #include <linux/stop_machine.h> #include <linux/stringify.h> #include <asm/traps.h> +#include <asm/opcodes.h> #include <asm/cacheflush.h> #include "kprobes.h" @@ -62,10 +63,10 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p) #ifdef CONFIG_THUMB2_KERNEL thumb = true; addr &= ~1; /* Bit 0 would normally be set to indicate Thumb code */ - insn = ((u16 *)addr)[0]; + insn = __mem_to_opcode_thumb16(((u16 *)addr)[0]); if (is_wide_instruction(insn)) { insn <<= 16; - insn |= ((u16 *)addr)[1]; + insn |= __mem_to_opcode_thumb16(((u16 *)addr)[1]); decode_insn = thumb32_kprobe_decode_insn; } else decode_insn = thumb16_kprobe_decode_insn; @@ -73,7 +74,7 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p) thumb = false; if (addr & 0x3) return -EINVAL; - insn = *p->addr; + insn = __mem_to_opcode_arm(*p->addr); decode_insn = arm_kprobe_decode_insn; #endif -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius ^ permalink raw reply related [flat|nested] 13+ messages in thread
[parent not found: <CAA3XUr3h5C0oJVrN5iXsM=NFkLMz_Tssp28CysZfwMNS5KvO-Q@mail.gmail.com>]
* updates for be8 patch series [not found] <CAA3XUr3h5C0oJVrN5iXsM=NFkLMz_Tssp28CysZfwMNS5KvO-Q@mail.gmail.com> @ 2013-07-23 17:40 ` Ben Dooks 2013-07-23 17:53 ` Victor Kamensky 0 siblings, 1 reply; 13+ messages in thread From: Ben Dooks @ 2013-07-23 17:40 UTC (permalink / raw) To: linux-arm-kernel On 23/07/13 18:24, Victor Kamensky wrote: > Hi Ben, > > Wrt BE8 little endian instructions you will need to fix couple more places: > ftrace arch/arm/kernel/ftrace.c > kprobes arch/arm/kernel/kprobes.c > Also in big endian mode atomic64_xxx function will need fixes, otherwise > perf > counters will be truncated on max of 32bit values > atomic64 arch/arm/include/asm/atomic.h Hmm, thought ftrace already did the right endian-ness conversions, although it is possible it could have missed one. The atomic instructions should work fine for BE8, they're using STRD which should be correct. > I've attached board independent (core) patch from my work that made > Pandaboard ES > kernel run in BE mode. You can check my version of fixes for above > issues in the > patch. Look for corresponding files changes. Please rework them > according to style > and rules of your patch series. Note the patch itself contains fixes for > few other > issues that already fixed in your series. I'll cross check and compare > individual > areas latter. I think you may find attached patch interesting as well, > it was developed > independent from yours but matches its very well. > -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius ^ permalink raw reply [flat|nested] 13+ messages in thread
* updates for be8 patch series 2013-07-23 17:40 ` Ben Dooks @ 2013-07-23 17:53 ` Victor Kamensky 2013-07-23 18:02 ` Ben Dooks 2013-07-23 18:03 ` Victor Kamensky 0 siblings, 2 replies; 13+ messages in thread From: Victor Kamensky @ 2013-07-23 17:53 UTC (permalink / raw) To: linux-arm-kernel On 23 July 2013 10:40, Ben Dooks <ben.dooks@codethink.co.uk> wrote: > On 23/07/13 18:24, Victor Kamensky wrote: >> >> Hi Ben, >> >> Wrt BE8 little endian instructions you will need to fix couple more >> places: >> ftrace arch/arm/kernel/ftrace.c >> kprobes arch/arm/kernel/kprobes.c >> Also in big endian mode atomic64_xxx function will need fixes, otherwise >> perf >> counters will be truncated on max of 32bit values >> atomic64 arch/arm/include/asm/atomic.h > > > Hmm, thought ftrace already did the right endian-ness conversions, > although it is possible it could have missed one. > > The atomic instructions should work fine for BE8, they're using > STRD which should be correct. atomic64_read and atomic64_set are fine, but atomic64_add, atomic64_add_return, atomic64_sub, atomic64_sub_return, etc are not OK. The issue is in arithmetic operations not load/store. Thanks, Victor > >> I've attached board independent (core) patch from my work that made >> Pandaboard ES >> kernel run in BE mode. You can check my version of fixes for above >> issues in the >> patch. Look for corresponding files changes. Please rework them >> according to style >> and rules of your patch series. Note the patch itself contains fixes for >> few other >> issues that already fixed in your series. I'll cross check and compare >> individual >> areas latter. I think you may find attached patch interesting as well, >> it was developed >> independent from yours but matches its very well. > > >> > > -- > Ben Dooks http://www.codethink.co.uk/ > Senior Engineer Codethink - Providing Genius ^ permalink raw reply [flat|nested] 13+ messages in thread
* updates for be8 patch series 2013-07-23 17:53 ` Victor Kamensky @ 2013-07-23 18:02 ` Ben Dooks 2013-07-23 18:03 ` Victor Kamensky 1 sibling, 0 replies; 13+ messages in thread From: Ben Dooks @ 2013-07-23 18:02 UTC (permalink / raw) To: linux-arm-kernel On 23/07/13 18:53, Victor Kamensky wrote: > On 23 July 2013 10:40, Ben Dooks<ben.dooks@codethink.co.uk> wrote: >> On 23/07/13 18:24, Victor Kamensky wrote: >>> >>> Hi Ben, >>> >>> Wrt BE8 little endian instructions you will need to fix couple more >>> places: >>> ftrace arch/arm/kernel/ftrace.c >>> kprobes arch/arm/kernel/kprobes.c >>> Also in big endian mode atomic64_xxx function will need fixes, otherwise >>> perf >>> counters will be truncated on max of 32bit values >>> atomic64 arch/arm/include/asm/atomic.h >> >> >> Hmm, thought ftrace already did the right endian-ness conversions, >> although it is possible it could have missed one. >> >> The atomic instructions should work fine for BE8, they're using >> STRD which should be correct. > > atomic64_read and atomic64_set are fine, but atomic64_add, > atomic64_add_return, atomic64_sub, atomic64_sub_return, etc are not OK. > The issue is in arithmetic operations not load/store. I'm still surprised these don't work. What exact situation are you seeing the failures in? Code would be very helpful here. -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius ^ permalink raw reply [flat|nested] 13+ messages in thread
* updates for be8 patch series 2013-07-23 17:53 ` Victor Kamensky 2013-07-23 18:02 ` Ben Dooks @ 2013-07-23 18:03 ` Victor Kamensky 2013-07-23 18:30 ` Ben Dooks 1 sibling, 1 reply; 13+ messages in thread From: Victor Kamensky @ 2013-07-23 18:03 UTC (permalink / raw) To: linux-arm-kernel Hi Ben, Sorry for multiple post. I am new to this, and I could not get mailing list to accept my original email. For sake of others so people can see discussed patch, here it goes again with patch file inlined now. Hope it will this time Wrt BE8 little endian instructions you will need to fix couple more places: ftrace arch/arm/kernel/ftrace.c kprobes arch/arm/kernel/kprobes.c Also in big endian mode atomic64_xxx function will need fixes, otherwise perf counters will be truncated on max of 32bit values atomic64 arch/arm/include/asm/atomic.h I've attached board independent (core) patch from my work that made Pandaboard ES kernel run in BE mode. You can check my version of fixes for above issues in the patch. Look for corresponding files changes. Please rework them according to style and rules of your patch series. Note the patch itself contains fixes for few other issues that already fixed in your series. I'll cross check and compare individual areas latter. I think you may find attached patch interesting as well, it was developed independent from yours but matches its very well. Wrt of testing: ftrace was tested on Pandaboard ES, krprobes changes were tested with SystemTap at some point of patch version on Pandaboard ES (not thumb mode though), also it may deteriorated while ported across different kernel version, good idea to rested last patch. atomic64 were tested on Pandaboard ES and Marvell Armada XP. Thanks, Victor Index: linux-linaro-tracking/arch/arm/Makefile =================================================================== --- linux-linaro-tracking.orig/arch/arm/Makefile +++ linux-linaro-tracking/arch/arm/Makefile @@ -16,6 +16,7 @@ LDFLAGS := LDFLAGS_vmlinux :=-p --no-undefined -X ifeq ($(CONFIG_CPU_ENDIAN_BE8),y) LDFLAGS_vmlinux += --be8 +KBUILD_LDFLAGS_MODULE += --be8 endif OBJCOPYFLAGS :=-O binary -R .comment -S @@ -43,12 +44,19 @@ ifeq ($(CONFIG_CC_STACKPROTECTOR),y) KBUILD_CFLAGS +=-fstack-protector endif +# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are +# places where CFLAGS alone are used (cmd_record_mcount). And it has to +# be in CPPFLAGS because it affects what macros (__ARMEB__, __BYTE_ORDER__, +# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be +# good compromise ifeq ($(CONFIG_CPU_BIG_ENDIAN),y) KBUILD_CPPFLAGS += -mbig-endian +KBUILD_CFLAGS += -mbig-endian AS += -EB LD += -EB else KBUILD_CPPFLAGS += -mlittle-endian +KBUILD_CFLAGS += -mlittle-endian AS += -EL LD += -EL endif Index: linux-linaro-tracking/arch/arm/boot/compressed/Makefile =================================================================== --- linux-linaro-tracking.orig/arch/arm/boot/compressed/Makefile +++ linux-linaro-tracking/arch/arm/boot/compressed/Makefile @@ -138,6 +138,24 @@ endif ifeq ($(CONFIG_CPU_ENDIAN_BE8),y) LDFLAGS_vmlinux += --be8 endif + +# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are +# places where CFLAGS alone are used (cmd_record_mcount). And it has to +# be in CPPFLAGS because it affects what macros (__ARMEB__, __BYTE_ORDER__, +# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be +# good compromise +ifeq ($(CONFIG_CPU_BIG_ENDIAN),y) +KBUILD_CPPFLAGS += -mbig-endian +KBUILD_CFLAGS += -mbig-endian +AS += -EB +LD += -EB +else +KBUILD_CPPFLAGS += -mlittle-endian +KBUILD_CFLAGS += -mlittle-endian +AS += -EL +LD += -EL +endif + # ? LDFLAGS_vmlinux += -p # Report unresolved symbol references Index: linux-linaro-tracking/arch/arm/boot/compressed/head.S =================================================================== --- linux-linaro-tracking.orig/arch/arm/boot/compressed/head.S +++ linux-linaro-tracking/arch/arm/boot/compressed/head.S @@ -141,6 +141,12 @@ start: #endif mov r7, r1 @ save architecture ID mov r8, r2 @ save atags pointer +#ifdef CONFIG_LITTLE_ENDIAN_LOADER + /* + * Switch to bigendian mode + */ + setend be +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ #ifndef __ARM_ARCH_2__ /* Index: linux-linaro-tracking/arch/arm/include/asm/setup.h =================================================================== --- linux-linaro-tracking.orig/arch/arm/include/asm/setup.h +++ linux-linaro-tracking/arch/arm/include/asm/setup.h @@ -53,4 +53,8 @@ extern int arm_add_memory(phys_addr_t st extern void early_print(const char *str, ...); extern void dump_machine_table(void); +#ifdef CONFIG_LITTLE_ENDIAN_LOADER +void byteswap_tags(struct tag *t); +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ + #endif Index: linux-linaro-tracking/arch/arm/kernel/Makefile =================================================================== --- linux-linaro-tracking.orig/arch/arm/kernel/Makefile +++ linux-linaro-tracking/arch/arm/kernel/Makefile @@ -22,6 +22,7 @@ obj-y := elf.o entry-armv.o entry-commo obj-$(CONFIG_ATAGS) += atags_parse.o obj-$(CONFIG_ATAGS_PROC) += atags_proc.o obj-$(CONFIG_DEPRECATED_PARAM_STRUCT) += atags_compat.o +obj-$(CONFIG_LITTLE_ENDIAN_LOADER) += atags_byteswap.o obj-$(CONFIG_OC_ETM) += etm.o obj-$(CONFIG_CPU_IDLE) += cpuidle.o Index: linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c =================================================================== --- /dev/null +++ linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c @@ -0,0 +1,119 @@ +#include <linux/slab.h> +#include <linux/proc_fs.h> +#include <linux/swab.h> +#include <asm/setup.h> +#include <asm/types.h> +#include <asm/page.h> + +#define byteswap32(x) (x) = __swab32((x)) +#define byteswap16(x) (x) = __swab16((x)) + +static void __init byteswap_tag_core(struct tag *tag) +{ + byteswap32(tag->u.core.flags); + byteswap32(tag->u.core.pagesize); + byteswap32(tag->u.core.rootdev); +} + +static void __init byteswap_tag_mem32(struct tag *tag) +{ + byteswap32(tag->u.mem.size); + byteswap32(tag->u.mem.start); +} + +static void __init byteswap_tag_videotext(struct tag *tag) +{ + byteswap16(tag->u.videotext.video_page); + byteswap16(tag->u.videotext.video_ega_bx); + byteswap16(tag->u.videotext.video_points); +} + +static void __init byteswap_tag_ramdisk(struct tag *tag) +{ + byteswap32(tag->u.ramdisk.flags); + byteswap32(tag->u.ramdisk.size); + byteswap32(tag->u.ramdisk.start); +} + +static void __init byteswap_tag_initrd(struct tag *tag) +{ + byteswap32(tag->u.initrd.start); + byteswap32(tag->u.initrd.size); +} + +static void __init byteswap_tag_serialnr(struct tag *tag) +{ + byteswap32(tag->u.serialnr.low); + byteswap32(tag->u.serialnr.high); +} + +static void __init byteswap_tag_revision(struct tag *tag) +{ + byteswap32(tag->u.revision.rev); +} + +static void __init byteswap_tag_videolfb(struct tag *tag) +{ + byteswap16(tag->u.videolfb.lfb_width); + byteswap16(tag->u.videolfb.lfb_height); + byteswap16(tag->u.videolfb.lfb_depth); + byteswap16(tag->u.videolfb.lfb_linelength); + byteswap32(tag->u.videolfb.lfb_base); + byteswap32(tag->u.videolfb.lfb_size); +} + +static void __init byteswap_tag_acorn(struct tag *tag) +{ + byteswap32(tag->u.acorn.memc_control_reg); + byteswap32(tag->u.acorn.vram_pages); +} + +static void __init byteswap_tag_memclk(struct tag *tag) +{ + byteswap32(tag->u.memclk.fmemclk); +} + +void __init byteswap_tags(struct tag *t) +{ + for (; t->hdr.size; t = tag_next(t)) { + byteswap32(t->hdr.size); + byteswap32(t->hdr.tag); + switch(t->hdr.tag) { + case ATAG_CORE: + byteswap_tag_core(t); + break; + case ATAG_MEM: + byteswap_tag_mem32(t); + break; + case ATAG_VIDEOTEXT: + byteswap_tag_videotext(t); + break; + case ATAG_RAMDISK: + byteswap_tag_ramdisk(t); + break; + case ATAG_INITRD2: + byteswap_tag_initrd(t); + break; + case ATAG_SERIAL: + byteswap_tag_serialnr(t); + break; + case ATAG_REVISION: + byteswap_tag_revision(t); + break; + case ATAG_VIDEOLFB: + byteswap_tag_videolfb(t); + break; + case ATAG_CMDLINE: + break; + case ATAG_ACORN: + byteswap_tag_acorn(t); + break; + case ATAG_MEMCLK: + byteswap_tag_memclk(t); + break; + default: + /* Need to complain? */ + break; + } + } +} Index: linux-linaro-tracking/arch/arm/kernel/head-common.S =================================================================== --- linux-linaro-tracking.orig/arch/arm/kernel/head-common.S +++ linux-linaro-tracking/arch/arm/kernel/head-common.S @@ -48,6 +48,9 @@ __vet_atags: bne 1f ldr r5, [r2, #0] +#ifdef CONFIG_LITTLE_ENDIAN_LOADER + rev r5, r5 +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ #ifdef CONFIG_OF_FLATTREE ldr r6, =OF_DT_MAGIC @ is it a DTB? cmp r5, r6 @@ -57,6 +60,9 @@ __vet_atags: cmpne r5, #ATAG_CORE_SIZE_EMPTY bne 1f ldr r5, [r2, #4] +#ifdef CONFIG_LITTLE_ENDIAN_LOADER + rev r5, r5 +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ ldr r6, =ATAG_CORE cmp r5, r6 bne 1f Index: linux-linaro-tracking/arch/arm/kernel/module.c =================================================================== --- linux-linaro-tracking.orig/arch/arm/kernel/module.c +++ linux-linaro-tracking/arch/arm/kernel/module.c @@ -45,6 +45,65 @@ void *module_alloc(unsigned long size) } #endif +/* + * For relocations in exectuable sections if we configured with + * CONFIG_CPU_ENDIAN_BE8 we assume that code is in little endian byteswap + * form already and we need to byteswap value when we read and when we + * write. If relocaton R_ARM_ABS32 type we assume that it is relocation for + * data piece and we do not perform byteswaps when such relocation applied. + * This herustic based approach may break in future. + * + * It seems that more correct way to handle be8 in kernel module is + * to build module without --be8 option, but perform instruction byteswap + * when module sections are loaded, after all relocation are applied. + * In order to do such byteswap we need to read all mappings symbols ($a, + * $d, $t), sort them, and apply byteswaps for $a and $t entries. + */ +static inline u32 read_section_u32(Elf32_Shdr *dstsec, unsigned long loc) +{ + u32 retval = *(u32 *)loc; + +#ifdef CONFIG_CPU_ENDIAN_BE8 + if (dstsec->sh_flags & SHF_EXECINSTR) { + retval = __swab32(retval); + } +#endif /* CONFIG_CPU_ENDIAN_BE8 */ + return retval; +} + +static inline void write_section_u32(Elf32_Shdr *dstsec, unsigned long loc, u32 value) +{ +#ifdef CONFIG_CPU_ENDIAN_BE8 + if (dstsec->sh_flags & SHF_EXECINSTR) { + value = __swab32(value); + } +#endif /* CONFIG_CPU_ENDIAN_BE8 */ + *(u32 *) loc = value; +} + +#ifdef CONFIG_THUMB2_KERNEL +static inline u16 read_section_u16(Elf32_Shdr *dstsec, unsigned long loc) +{ + u16 retval = *(u16 *) loc; +#ifdef CONFIG_CPU_ENDIAN_BE8 + if (dstsec->sh_flags & SHF_EXECINSTR) { + retval = __swab16(retval); + } +#endif /* CONFIG_CPU_ENDIAN_BE8 */ + return retval; +} + +static inline void write_section_u16(Elf32_Shdr *dstsec, unsigned long loc, u16 value) +{ +#ifdef CONFIG_CPU_ENDIAN_BE8 + if (dstsec->sh_flags & SHF_EXECINSTR) { + value = __swab16(value); + } +#endif /* CONFIG_CPU_ENDIAN_BE8 */ + *(u16 *) loc = value; +} +#endif /* CONFIG_THUMB2_KERNEL */ + int apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex, unsigned int relindex, struct module *module) @@ -60,6 +119,7 @@ apply_relocate(Elf32_Shdr *sechdrs, cons Elf32_Sym *sym; const char *symname; s32 offset; + u32 loc_u32; #ifdef CONFIG_THUMB2_KERNEL u32 upper, lower, sign, j1, j2; #endif @@ -95,7 +155,8 @@ apply_relocate(Elf32_Shdr *sechdrs, cons case R_ARM_PC24: case R_ARM_CALL: case R_ARM_JUMP24: - offset = (*(u32 *)loc & 0x00ffffff) << 2; + loc_u32 = read_section_u32(dstsec, loc); + offset = (loc_u32 & 0x00ffffff) << 2; if (offset & 0x02000000) offset -= 0x04000000; @@ -112,27 +173,33 @@ apply_relocate(Elf32_Shdr *sechdrs, cons offset >>= 2; - *(u32 *)loc &= 0xff000000; - *(u32 *)loc |= offset & 0x00ffffff; + loc_u32 &= 0xff000000; + loc_u32 |= offset & 0x00ffffff; + write_section_u32(dstsec, loc, loc_u32); break; - case R_ARM_V4BX: + case R_ARM_V4BX: + loc_u32 = read_section_u32(dstsec, loc); /* Preserve Rm and the condition code. Alter * other bits to re-code instruction as * MOV PC,Rm. */ - *(u32 *)loc &= 0xf000000f; - *(u32 *)loc |= 0x01a0f000; + loc_u32 &= 0xf000000f; + loc_u32 |= 0x01a0f000; + write_section_u32(dstsec, loc, loc_u32); break; case R_ARM_PREL31: - offset = *(u32 *)loc + sym->st_value - loc; - *(u32 *)loc = offset & 0x7fffffff; + loc_u32 = read_section_u32(dstsec, loc); + offset = loc_u32 + sym->st_value - loc; + loc_u32 = offset & 0x7fffffff; + write_section_u32(dstsec, loc, loc_u32); break; case R_ARM_MOVW_ABS_NC: case R_ARM_MOVT_ABS: - offset = *(u32 *)loc; + loc_u32 = read_section_u32(dstsec, loc); + offset = loc_u32; offset = ((offset & 0xf0000) >> 4) | (offset & 0xfff); offset = (offset ^ 0x8000) - 0x8000; @@ -140,16 +207,17 @@ apply_relocate(Elf32_Shdr *sechdrs, cons if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS) offset >>= 16; - *(u32 *)loc &= 0xfff0f000; - *(u32 *)loc |= ((offset & 0xf000) << 4) | - (offset & 0x0fff); + loc_u32 &= 0xfff0f000; + loc_u32 |= ((offset & 0xf000) << 4) | + (offset & 0x0fff); + write_section_u32(dstsec, loc, loc_u32); break; #ifdef CONFIG_THUMB2_KERNEL case R_ARM_THM_CALL: case R_ARM_THM_JUMP24: - upper = *(u16 *)loc; - lower = *(u16 *)(loc + 2); + upper = read_section_u16(dstsec, loc); + lower = read_section_u16(dstsec, loc + 2); /* * 25 bit signed address range (Thumb-2 BL and B.W @@ -198,17 +266,19 @@ apply_relocate(Elf32_Shdr *sechdrs, cons sign = (offset >> 24) & 1; j1 = sign ^ (~(offset >> 23) & 1); j2 = sign ^ (~(offset >> 22) & 1); - *(u16 *)loc = (u16)((upper & 0xf800) | (sign << 10) | - ((offset >> 12) & 0x03ff)); - *(u16 *)(loc + 2) = (u16)((lower & 0xd000) | - (j1 << 13) | (j2 << 11) | - ((offset >> 1) & 0x07ff)); + write_section_u16(dstsec, loc, + (u16)((upper & 0xf800) | (sign << 10) | + ((offset >> 12) & 0x03ff))); + write_section_u16(dstsec, loc + 2, + (u16)((lower & 0xd000) | + (j1 << 13) | (j2 << 11) | + ((offset >> 1) & 0x07ff))); break; case R_ARM_THM_MOVW_ABS_NC: case R_ARM_THM_MOVT_ABS: - upper = *(u16 *)loc; - lower = *(u16 *)(loc + 2); + upper = read_section_u16(dstsec, loc); + lower = read_section_u16(dstsec, loc + 2); /* * MOVT/MOVW instructions encoding in Thumb-2: @@ -229,12 +299,14 @@ apply_relocate(Elf32_Shdr *sechdrs, cons if (ELF32_R_TYPE(rel->r_info) == R_ARM_THM_MOVT_ABS) offset >>= 16; - *(u16 *)loc = (u16)((upper & 0xfbf0) | - ((offset & 0xf000) >> 12) | - ((offset & 0x0800) >> 1)); - *(u16 *)(loc + 2) = (u16)((lower & 0x8f00) | - ((offset & 0x0700) << 4) | - (offset & 0x00ff)); + write_section_u16(dstsec, loc, + (u16)((upper & 0xfbf0) | + ((offset & 0xf000) >> 12) | + ((offset & 0x0800) >> 1))); + write_section_u16(dstsec, loc + 2, + (u16)((lower & 0x8f00) | + ((offset & 0x0700) << 4) | + (offset & 0x00ff))); break; #endif Index: linux-linaro-tracking/arch/arm/mm/Kconfig =================================================================== --- linux-linaro-tracking.orig/arch/arm/mm/Kconfig +++ linux-linaro-tracking/arch/arm/mm/Kconfig @@ -695,6 +695,15 @@ config CPU_ENDIAN_BE32 help Support for the BE-32 (big-endian) mode on pre-ARMv6 processors. +config LITTLE_ENDIAN_LOADER + bool + depends on CPU_BIG_ENDIAN + default y + help + If Linux should operate in big-endian mode, but bootloader (i.e u-boot) + is still in little endian mode and passes boot information in little + endian form set this flag + config CPU_HIGH_VECTOR depends on !MMU && CPU_CP15 && !CPU_ARM740T bool "Select the High exception vector" Index: linux-linaro-tracking/arch/arm/mm/alignment.c =================================================================== --- linux-linaro-tracking.orig/arch/arm/mm/alignment.c +++ linux-linaro-tracking/arch/arm/mm/alignment.c @@ -792,6 +792,10 @@ do_alignment(unsigned long addr, unsigne regs->ARM_pc += isize; +#ifdef CONFIG_CPU_ENDIAN_BE8 + instr = __swab32(instr); /* little endian instruction */ +#endif /* CONFIG_CPU_ENDIAN_BE8 */ + switch (CODING_BITS(instr)) { case 0x00000000: /* 3.13.4 load/store instruction extensions */ if (LDSTHD_I_BIT(instr)) Index: linux-linaro-tracking/arch/arm/kernel/head.S =================================================================== --- linux-linaro-tracking.orig/arch/arm/kernel/head.S +++ linux-linaro-tracking/arch/arm/kernel/head.S @@ -89,6 +89,12 @@ ENTRY(stext) @ ensure svc mode and all interrupts masked safe_svcmode_maskall r9 +#ifdef CONFIG_LITTLE_ENDIAN_LOADER + /* + * Switch to bigendian mode + */ + setend be +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ mrc p15, 0, r9, c0, c0 @ get processor id bl __lookup_processor_type @ r5=procinfo r9=cpuid movs r10, r5 @ invalid processor (r5=0)? @@ -356,6 +362,12 @@ ENTRY(secondary_startup) #endif safe_svcmode_maskall r9 +#ifdef CONFIG_LITTLE_ENDIAN_LOADER + /* + * Switch to bigendian mode + */ + setend be +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ mrc p15, 0, r9, c0, c0 @ get processor id bl __lookup_processor_type movs r10, r5 @ invalid processor? @@ -427,6 +439,9 @@ __enable_mmu: #ifdef CONFIG_CPU_ICACHE_DISABLE bic r0, r0, #CR_I #endif +#ifdef CONFIG_CPU_ENDIAN_BE8 + orr r0, r0, #CR_EE @ big-endian page tables +#endif #ifdef CONFIG_ARM_LPAE mov r5, #0 mcrr p15, 0, r4, r5, c2 @ load TTBR0 @@ -592,8 +607,14 @@ __fixup_a_pv_table: b 2f 1: add r7, r3 ldrh ip, [r7, #2] +#ifdef __ARMEB__ + rev ip, ip @ byteswap instruction +#endif /* __ARMEB__ */ and ip, 0x8f00 orr ip, r6 @ mask in offset bits 31-24 +#ifdef __ARMEB__ + rev ip, ip @ byteswap instruction +#endif /* __ARMEB__ */ strh ip, [r7, #2] 2: cmp r4, r5 ldrcc r7, [r4], #4 @ use branch for delay slot @@ -602,8 +623,14 @@ __fixup_a_pv_table: #else b 2f 1: ldr ip, [r7, r3] +#ifdef __ARMEB__ + rev ip, ip @ byteswap instruction +#endif /* __ARMEB__ */ bic ip, ip, #0x000000ff orr ip, ip, r6 @ mask in offset bits 31-24 +#ifdef __ARMEB__ + rev ip, ip @ byteswap instruction +#endif /* __ARMEB__ */ str ip, [r7, r3] 2: cmp r4, r5 ldrcc r7, [r4], #4 @ use branch for delay slot Index: linux-linaro-tracking/arch/arm/kernel/kprobes.c =================================================================== --- linux-linaro-tracking.orig/arch/arm/kernel/kprobes.c +++ linux-linaro-tracking/arch/arm/kernel/kprobes.c @@ -66,14 +66,24 @@ int __kprobes arch_prepare_kprobe(struct if (is_wide_instruction(insn)) { insn <<= 16; insn |= ((u16 *)addr)[1]; +#ifdef CONFIG_CPU_ENDIAN_BE8 + insn = __swab32(insn); /* little endian instruction */ +#endif decode_insn = thumb32_kprobe_decode_insn; - } else - decode_insn = thumb16_kprobe_decode_insn; + } else { + decode_insn = thumb16_kprobe_decode_insn; +#ifdef CONFIG_CPU_ENDIAN_BE8 + insn = __swab16(insn); /* little endian instruction */ +#endif + } #else /* !CONFIG_THUMB2_KERNEL */ thumb = false; if (addr & 0x3) return -EINVAL; insn = *p->addr; +#ifdef CONFIG_CPU_ENDIAN_BE8 + insn = __swab32(insn); /* little endian instruction */ +#endif decode_insn = arm_kprobe_decode_insn; #endif @@ -89,7 +99,11 @@ int __kprobes arch_prepare_kprobe(struct if (!p->ainsn.insn) return -ENOMEM; for (is = 0; is < MAX_INSN_SIZE; ++is) +#ifndef CONFIG_CPU_ENDIAN_BE8 p->ainsn.insn[is] = tmp_insn[is]; +#else + p->ainsn.insn[is] = __swab32(tmp_insn[is]); /* little endian instruction */ +#endif flush_insns(p->ainsn.insn, sizeof(p->ainsn.insn[0]) * MAX_INSN_SIZE); p->ainsn.insn_fn = (kprobe_insn_fn_t *) @@ -113,10 +127,17 @@ void __kprobes arch_arm_kprobe(struct kp /* Remove any Thumb flag */ addr = (void *)((uintptr_t)p->addr & ~1); - if (is_wide_instruction(p->opcode)) + if (is_wide_instruction(p->opcode)) { brkp = KPROBE_THUMB32_BREAKPOINT_INSTRUCTION; - else +#ifdef CONFIG_CPU_ENDIAN_BE8 + brkp = __swab32(brkp); +#endif /* CONFIG_CPU_ENDIAN_BE8 */ + } else { brkp = KPROBE_THUMB16_BREAKPOINT_INSTRUCTION; +#ifdef CONFIG_CPU_ENDIAN_BE8 + brkp = __swab16(brkp); +#endif /* CONFIG_CPU_ENDIAN_BE8 */ + } } else { kprobe_opcode_t insn = p->opcode; @@ -127,6 +148,10 @@ void __kprobes arch_arm_kprobe(struct kp brkp |= 0xe0000000; /* Unconditional instruction */ else brkp |= insn & 0xf0000000; /* Copy condition from insn */ +#ifdef CONFIG_CPU_ENDIAN_BE8 + brkp = __swab32(brkp); +#endif /* CONFIG_CPU_ENDIAN_BE8 */ + } patch_text(addr, brkp); @@ -144,8 +169,12 @@ int __kprobes __arch_disarm_kprobe(void { struct kprobe *kp = p; void *addr = (void *)((uintptr_t)kp->addr & ~1); + kprobe_opcode_t insn = kp->opcode; - __patch_text(addr, kp->opcode); +#ifdef CONFIG_CPU_ENDIAN_BE8 + insn = __swab32(insn); /* little endian instruction */ +#endif + __patch_text(addr, insn); return 0; } Index: linux-linaro-tracking/arch/arm/kernel/traps.c =================================================================== --- linux-linaro-tracking.orig/arch/arm/kernel/traps.c +++ linux-linaro-tracking/arch/arm/kernel/traps.c @@ -426,6 +426,10 @@ asmlinkage void __exception do_undefinst goto die_sig; } +#ifdef CONFIG_CPU_ENDIAN_BE8 + instr = __swab32(instr); /* little endian instruction */ +#endif /* CONFIG_CPU_ENDIAN_BE8 */ + if (call_undef_hook(regs, instr) == 0) return; Index: linux-linaro-tracking/arch/arm/kernel/ftrace.c =================================================================== --- linux-linaro-tracking.orig/arch/arm/kernel/ftrace.c +++ linux-linaro-tracking/arch/arm/kernel/ftrace.c @@ -100,10 +100,18 @@ static int ftrace_modify_code(unsigned l if (probe_kernel_read(&replaced, (void *)pc, MCOUNT_INSN_SIZE)) return -EFAULT; +#ifdef CONFIG_CPU_ENDIAN_BE8 + replaced = __swab32(replaced); /* little endian instruction */ +#endif /* CONFIG_CPU_ENDIAN_BE8 */ + if (replaced != old) return -EINVAL; } +#ifdef CONFIG_CPU_ENDIAN_BE8 + new = __swab32(new); /* little endian instruction */ +#endif /* CONFIG_CPU_ENDIAN_BE8 */ + if (probe_kernel_write((void *)pc, &new, MCOUNT_INSN_SIZE)) return -EPERM; Index: linux-linaro-tracking/arch/arm/kernel/atags_parse.c =================================================================== --- linux-linaro-tracking.orig/arch/arm/kernel/atags_parse.c +++ linux-linaro-tracking/arch/arm/kernel/atags_parse.c @@ -216,6 +216,10 @@ struct machine_desc * __init setup_machi if (tags->hdr.tag != ATAG_CORE) convert_to_tag_list(tags); #endif +#ifdef CONFIG_LITTLE_ENDIAN_LOADER + byteswap_tags(tags); +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ + if (tags->hdr.tag != ATAG_CORE) { early_print("Warning: Neither atags nor dtb found\n"); tags = (struct tag *)&default_tags; Index: linux-linaro-tracking/arch/arm/kernel/sleep.S =================================================================== --- linux-linaro-tracking.orig/arch/arm/kernel/sleep.S +++ linux-linaro-tracking/arch/arm/kernel/sleep.S @@ -81,6 +81,13 @@ ENDPROC(cpu_resume_after_mmu) .data .align ENTRY(cpu_resume) +#ifdef CONFIG_LITTLE_ENDIAN_LOADER + /* + * Switch to bigendian mode. + * Platform resume code most likely already did that, but just in case + */ + setend be +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ #ifdef CONFIG_SMP mov r1, #0 @ fall-back logical index for UP ALT_SMP(mrc p15, 0, r0, c0, c0, 5) Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h =================================================================== --- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h +++ linux-linaro-tracking/arch/arm/include/asm/atomic.h @@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a __asm__ __volatile__("@ atomic64_add\n" "1: ldrexd %0, %H0, [%3]\n" +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ " adds %0, %0, %4\n" " adc %H0, %H0, %H4\n" +#else +" adds %H0, %H0, %H4\n" +" adc %0, %0, %4\n" +#endif " strexd %1, %0, %H0, [%3]\n" " teq %1, #0\n" " bne 1b" @@ -320,8 +325,13 @@ static inline u64 atomic64_add_return(u6 __asm__ __volatile__("@ atomic64_add_return\n" "1: ldrexd %0, %H0, [%3]\n" +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ " adds %0, %0, %4\n" " adc %H0, %H0, %H4\n" +#else +" adds %H0, %H0, %H4\n" +" adc %0, %0, %4\n" +#endif " strexd %1, %0, %H0, [%3]\n" " teq %1, #0\n" " bne 1b" @@ -341,8 +351,13 @@ static inline void atomic64_sub(u64 i, a __asm__ __volatile__("@ atomic64_sub\n" "1: ldrexd %0, %H0, [%3]\n" +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ " subs %0, %0, %4\n" " sbc %H0, %H0, %H4\n" +#else +" subs %H0, %H0, %H4\n" +" sbc %0, %0, %4\n" +#endif " strexd %1, %0, %H0, [%3]\n" " teq %1, #0\n" " bne 1b" @@ -360,8 +375,13 @@ static inline u64 atomic64_sub_return(u6 __asm__ __volatile__("@ atomic64_sub_return\n" "1: ldrexd %0, %H0, [%3]\n" +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ " subs %0, %0, %4\n" " sbc %H0, %H0, %H4\n" +#else +" subs %H0, %H0, %H4\n" +" sbc %0, %0, %4\n" +#endif " strexd %1, %0, %H0, [%3]\n" " teq %1, #0\n" " bne 1b" @@ -428,9 +448,15 @@ static inline u64 atomic64_dec_if_positi __asm__ __volatile__("@ atomic64_dec_if_positive\n" "1: ldrexd %0, %H0, [%3]\n" +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ " subs %0, %0, #1\n" " sbc %H0, %H0, #0\n" " teq %H0, #0\n" +#else +" subs %H0, %H0, #1\n" +" sbc %0, %0, #0\n" +" teq %0, #0\n" +#endif " bmi 2f\n" " strexd %1, %0, %H0, [%3]\n" " teq %1, #0\n" @@ -459,8 +485,13 @@ static inline int atomic64_add_unless(at " teqeq %H0, %H5\n" " moveq %1, #0\n" " beq 2f\n" +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ " adds %0, %0, %6\n" " adc %H0, %H0, %H6\n" +#else +" adds %H0, %H0, %H6\n" +" adc %0, %0, %6\n" +#endif " strexd %2, %0, %H0, [%4]\n" " teq %2, #0\n" " bne 1b\n" On 23 July 2013 10:53, Victor Kamensky <victor.kamensky@linaro.org> wrote: > On 23 July 2013 10:40, Ben Dooks <ben.dooks@codethink.co.uk> wrote: >> On 23/07/13 18:24, Victor Kamensky wrote: >>> >>> Hi Ben, >>> >>> Wrt BE8 little endian instructions you will need to fix couple more >>> places: >>> ftrace arch/arm/kernel/ftrace.c >>> kprobes arch/arm/kernel/kprobes.c >>> Also in big endian mode atomic64_xxx function will need fixes, otherwise >>> perf >>> counters will be truncated on max of 32bit values >>> atomic64 arch/arm/include/asm/atomic.h >> >> >> Hmm, thought ftrace already did the right endian-ness conversions, >> although it is possible it could have missed one. >> >> The atomic instructions should work fine for BE8, they're using >> STRD which should be correct. > > atomic64_read and atomic64_set are fine, but atomic64_add, > atomic64_add_return, atomic64_sub, atomic64_sub_return, etc are not OK. > The issue is in arithmetic operations not load/store. > > Thanks, > Victor > >> >>> I've attached board independent (core) patch from my work that made >>> Pandaboard ES >>> kernel run in BE mode. You can check my version of fixes for above >>> issues in the >>> patch. Look for corresponding files changes. Please rework them >>> according to style >>> and rules of your patch series. Note the patch itself contains fixes for >>> few other >>> issues that already fixed in your series. I'll cross check and compare >>> individual >>> areas latter. I think you may find attached patch interesting as well, >>> it was developed >>> independent from yours but matches its very well. >> >> >>> >> >> -- >> Ben Dooks http://www.codethink.co.uk/ >> Senior Engineer Codethink - Providing Genius ^ permalink raw reply [flat|nested] 13+ messages in thread
* updates for be8 patch series 2013-07-23 18:03 ` Victor Kamensky @ 2013-07-23 18:30 ` Ben Dooks 2013-07-24 1:06 ` Victor Kamensky 2013-07-24 7:31 ` Victor Kamensky 0 siblings, 2 replies; 13+ messages in thread From: Ben Dooks @ 2013-07-23 18:30 UTC (permalink / raw) To: linux-arm-kernel On 23/07/13 19:03, Victor Kamensky wrote: > Hi Ben, > > Sorry for multiple post. I am new to this, and I could not get mailing > list to accept my original email. > For sake of others so people can see discussed patch, here it goes > again with patch file inlined > now. Hope it will this time > > Wrt BE8 little endian instructions you will need to fix couple more places: > ftrace arch/arm/kernel/ftrace.c > kprobes arch/arm/kernel/kprobes.c > Also in big endian mode atomic64_xxx function will need fixes, otherwise perf > counters will be truncated on max of 32bit values > atomic64 arch/arm/include/asm/atomic.h > > I've attached board independent (core) patch from my work that made > Pandaboard ES > kernel run in BE mode. You can check my version of fixes for above issues in the > patch. Look for corresponding files changes. Please rework them > according to style > and rules of your patch series. Note the patch itself contains fixes > for few other > issues that already fixed in your series. I'll cross check and compare > individual > areas latter. I think you may find attached patch interesting as well, > it was developed > independent from yours but matches its very well. > > Wrt of testing: ftrace was tested on Pandaboard ES, krprobes changes were tested > with SystemTap at some point of patch version on Pandaboard ES (not > thumb mode though), > also it may deteriorated while ported across different kernel version, > good idea to > rested last patch. atomic64 were tested on Pandaboard ES and Marvell Armada XP. > > Thanks, > Victor First notes, please always format your emails (where possible) to under 80 characters per line so that they do not get mangled when viewed on limited screens (for example, I often read email via a 24x80 terminal) Second note, when producing patches, it is better to try and split them into a series of patches to reach a goal than just one large patch. It keeps a lot of useful information (see patch comments) and it means that if there is a problem, then bits can be lifted out to check. > Index: linux-linaro-tracking/arch/arm/Makefile > =================================================================== > --- linux-linaro-tracking.orig/arch/arm/Makefile > +++ linux-linaro-tracking/arch/arm/Makefile > @@ -16,6 +16,7 @@ LDFLAGS := > LDFLAGS_vmlinux :=-p --no-undefined -X > ifeq ($(CONFIG_CPU_ENDIAN_BE8),y) > LDFLAGS_vmlinux += --be8 > +KBUILD_LDFLAGS_MODULE += --be8 > endif > > OBJCOPYFLAGS :=-O binary -R .comment -S > @@ -43,12 +44,19 @@ ifeq ($(CONFIG_CC_STACKPROTECTOR),y) > KBUILD_CFLAGS +=-fstack-protector > endif > > +# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are > +# places where CFLAGS alone are used (cmd_record_mcount). And it has to > +# be in CPPFLAGS because it affects what macros (__ARMEB__, __BYTE_ORDER__, > +# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be > +# good compromise > ifeq ($(CONFIG_CPU_BIG_ENDIAN),y) > KBUILD_CPPFLAGS += -mbig-endian > +KBUILD_CFLAGS += -mbig-endian > AS += -EB > LD += -EB > else > KBUILD_CPPFLAGS += -mlittle-endian > +KBUILD_CFLAGS += -mlittle-endian > AS += -EL > LD += -EL > endif See the comment below. > Index: linux-linaro-tracking/arch/arm/boot/compressed/Makefile > =================================================================== > --- linux-linaro-tracking.orig/arch/arm/boot/compressed/Makefile > +++ linux-linaro-tracking/arch/arm/boot/compressed/Makefile > @@ -138,6 +138,24 @@ endif > ifeq ($(CONFIG_CPU_ENDIAN_BE8),y) > LDFLAGS_vmlinux += --be8 > endif > + > +# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are > +# places where CFLAGS alone are used (cmd_record_mcount). And it has to > +# be in CPPFLAGS because it affects what macros (__ARMEB__, __BYTE_ORDER__, > +# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be > +# good compromise > +ifeq ($(CONFIG_CPU_BIG_ENDIAN),y) > +KBUILD_CPPFLAGS += -mbig-endian > +KBUILD_CFLAGS += -mbig-endian > +AS += -EB > +LD += -EB > +else > +KBUILD_CPPFLAGS += -mlittle-endian > +KBUILD_CFLAGS += -mlittle-endian > +AS += -EL > +LD += -EL > +endif > + > # ? > LDFLAGS_vmlinux += -p > # Report unresolved symbol references I've not been bitten by this one. Is there anyone else who can comment on KBUILD_CFLAGS vs KBUILD_CPPFLAGS? I had assumed that arch/arm/boot/compressed/Makefile would have inherited the flags from the arch/arm/Makefile. Someone else would be better qualified to comment on this. > Index: linux-linaro-tracking/arch/arm/boot/compressed/head.S > =================================================================== > --- linux-linaro-tracking.orig/arch/arm/boot/compressed/head.S > +++ linux-linaro-tracking/arch/arm/boot/compressed/head.S > @@ -141,6 +141,12 @@ start: > #endif > mov r7, r1 @ save architecture ID > mov r8, r2 @ save atags pointer > +#ifdef CONFIG_LITTLE_ENDIAN_LOADER > + /* > + * Switch to bigendian mode > + */ > + setend be > +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ > > #ifndef __ARM_ARCH_2__ > /* > Index: linux-linaro-tracking/arch/arm/include/asm/setup.h > =================================================================== > --- linux-linaro-tracking.orig/arch/arm/include/asm/setup.h > +++ linux-linaro-tracking/arch/arm/include/asm/setup.h > @@ -53,4 +53,8 @@ extern int arm_add_memory(phys_addr_t st > extern void early_print(const char *str, ...); > extern void dump_machine_table(void); > > +#ifdef CONFIG_LITTLE_ENDIAN_LOADER > +void byteswap_tags(struct tag *t); > +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ > + > #endif You could have done #ifdef CONFIG_LITTLE_ENDIAN_LOADER void byteswap_tags(struct tag *t); #else static inline void byteswap_tags(struct tag *t) { } #endif /* CONFIG_LITTLE_ENDIAN_LOADER */ > Index: linux-linaro-tracking/arch/arm/kernel/Makefile > =================================================================== > --- linux-linaro-tracking.orig/arch/arm/kernel/Makefile > +++ linux-linaro-tracking/arch/arm/kernel/Makefile > @@ -22,6 +22,7 @@ obj-y := elf.o entry-armv.o entry-commo > obj-$(CONFIG_ATAGS) += atags_parse.o > obj-$(CONFIG_ATAGS_PROC) += atags_proc.o > obj-$(CONFIG_DEPRECATED_PARAM_STRUCT) += atags_compat.o > +obj-$(CONFIG_LITTLE_ENDIAN_LOADER) += atags_byteswap.o > > obj-$(CONFIG_OC_ETM) += etm.o > obj-$(CONFIG_CPU_IDLE) += cpuidle.o > Index: linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c > =================================================================== > --- /dev/null > +++ linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c > @@ -0,0 +1,119 @@ > +#include<linux/slab.h> > +#include<linux/proc_fs.h> > +#include<linux/swab.h> > +#include<asm/setup.h> > +#include<asm/types.h> > +#include<asm/page.h> > + > +#define byteswap32(x) (x) = __swab32((x)) > +#define byteswap16(x) (x) = __swab16((x)) > + > +static void __init byteswap_tag_core(struct tag *tag) > +{ > + byteswap32(tag->u.core.flags); > + byteswap32(tag->u.core.pagesize); > + byteswap32(tag->u.core.rootdev); > +} > + > +static void __init byteswap_tag_mem32(struct tag *tag) > +{ > + byteswap32(tag->u.mem.size); > + byteswap32(tag->u.mem.start); > +} > + > +static void __init byteswap_tag_videotext(struct tag *tag) > +{ > + byteswap16(tag->u.videotext.video_page); > + byteswap16(tag->u.videotext.video_ega_bx); > + byteswap16(tag->u.videotext.video_points); > +} > + > +static void __init byteswap_tag_ramdisk(struct tag *tag) > +{ > + byteswap32(tag->u.ramdisk.flags); > + byteswap32(tag->u.ramdisk.size); > + byteswap32(tag->u.ramdisk.start); > +} > + > +static void __init byteswap_tag_initrd(struct tag *tag) > +{ > + byteswap32(tag->u.initrd.start); > + byteswap32(tag->u.initrd.size); > +} > + > +static void __init byteswap_tag_serialnr(struct tag *tag) > +{ > + byteswap32(tag->u.serialnr.low); > + byteswap32(tag->u.serialnr.high); > +} > + > +static void __init byteswap_tag_revision(struct tag *tag) > +{ > + byteswap32(tag->u.revision.rev); > +} > + > +static void __init byteswap_tag_videolfb(struct tag *tag) > +{ > + byteswap16(tag->u.videolfb.lfb_width); > + byteswap16(tag->u.videolfb.lfb_height); > + byteswap16(tag->u.videolfb.lfb_depth); > + byteswap16(tag->u.videolfb.lfb_linelength); > + byteswap32(tag->u.videolfb.lfb_base); > + byteswap32(tag->u.videolfb.lfb_size); > +} > + > +static void __init byteswap_tag_acorn(struct tag *tag) > +{ > + byteswap32(tag->u.acorn.memc_control_reg); > + byteswap32(tag->u.acorn.vram_pages); > +} > + > +static void __init byteswap_tag_memclk(struct tag *tag) > +{ > + byteswap32(tag->u.memclk.fmemclk); > +} > + > +void __init byteswap_tags(struct tag *t) > +{ > + for (; t->hdr.size; t = tag_next(t)) { > + byteswap32(t->hdr.size); > + byteswap32(t->hdr.tag); > + switch(t->hdr.tag) { > + case ATAG_CORE: > + byteswap_tag_core(t); > + break; > + case ATAG_MEM: > + byteswap_tag_mem32(t); > + break; > + case ATAG_VIDEOTEXT: > + byteswap_tag_videotext(t); > + break; > + case ATAG_RAMDISK: > + byteswap_tag_ramdisk(t); > + break; > + case ATAG_INITRD2: > + byteswap_tag_initrd(t); > + break; > + case ATAG_SERIAL: > + byteswap_tag_serialnr(t); > + break; > + case ATAG_REVISION: > + byteswap_tag_revision(t); > + break; > + case ATAG_VIDEOLFB: > + byteswap_tag_videolfb(t); > + break; > + case ATAG_CMDLINE: > + break; > + case ATAG_ACORN: > + byteswap_tag_acorn(t); > + break; > + case ATAG_MEMCLK: > + byteswap_tag_memclk(t); > + break; > + default: > + /* Need to complain? */ > + break; > + } > + } > +} I think swapping on read is a better idea, as it leaves these in their original format. PS, looks like you either have a formatting issue with your email client or your code. Did this pass checkpatch? > Index: linux-linaro-tracking/arch/arm/kernel/head-common.S > =================================================================== > --- linux-linaro-tracking.orig/arch/arm/kernel/head-common.S > +++ linux-linaro-tracking/arch/arm/kernel/head-common.S > @@ -48,6 +48,9 @@ __vet_atags: > bne 1f > > ldr r5, [r2, #0] > +#ifdef CONFIG_LITTLE_ENDIAN_LOADER > + rev r5, r5 > +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ > #ifdef CONFIG_OF_FLATTREE > ldr r6, =OF_DT_MAGIC @ is it a DTB? > cmp r5, r6 > @@ -57,6 +60,9 @@ __vet_atags: > cmpne r5, #ATAG_CORE_SIZE_EMPTY > bne 1f > ldr r5, [r2, #4] > +#ifdef CONFIG_LITTLE_ENDIAN_LOADER > + rev r5, r5 > +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ > ldr r6, =ATAG_CORE > cmp r5, r6 > bne 1f bit-untidy. also, does OF_DT_MAGIC get changed by the endian-ness, or is that somewhere else in the head code? > Index: linux-linaro-tracking/arch/arm/kernel/module.c > =================================================================== > --- linux-linaro-tracking.orig/arch/arm/kernel/module.c > +++ linux-linaro-tracking/arch/arm/kernel/module.c > @@ -45,6 +45,65 @@ void *module_alloc(unsigned long size) > } > #endif > > +/* > + * For relocations in exectuable sections if we configured with > + * CONFIG_CPU_ENDIAN_BE8 we assume that code is in little endian byteswap > + * form already and we need to byteswap value when we read and when we > + * write. If relocaton R_ARM_ABS32 type we assume that it is relocation for > + * data piece and we do not perform byteswaps when such relocation applied. > + * This herustic based approach may break in future. > + * > + * It seems that more correct way to handle be8 in kernel module is > + * to build module without --be8 option, but perform instruction byteswap > + * when module sections are loaded, after all relocation are applied. > + * In order to do such byteswap we need to read all mappings symbols ($a, > + * $d, $t), sort them, and apply byteswaps for $a and $t entries. > + */ > +static inline u32 read_section_u32(Elf32_Shdr *dstsec, unsigned long loc) > +{ > + u32 retval = *(u32 *)loc; > + > +#ifdef CONFIG_CPU_ENDIAN_BE8 > + if (dstsec->sh_flags& SHF_EXECINSTR) { > + retval = __swab32(retval); > + } > +#endif /* CONFIG_CPU_ENDIAN_BE8 */ > + return retval; > +} > + > +static inline void write_section_u32(Elf32_Shdr *dstsec, unsigned > long loc, u32 value) > +{ > +#ifdef CONFIG_CPU_ENDIAN_BE8 > + if (dstsec->sh_flags& SHF_EXECINSTR) { > + value = __swab32(value); > + } > +#endif /* CONFIG_CPU_ENDIAN_BE8 */ > + *(u32 *) loc = value; > +} > + > +#ifdef CONFIG_THUMB2_KERNEL > +static inline u16 read_section_u16(Elf32_Shdr *dstsec, unsigned long loc) > +{ > + u16 retval = *(u16 *) loc; > +#ifdef CONFIG_CPU_ENDIAN_BE8 > + if (dstsec->sh_flags& SHF_EXECINSTR) { > + retval = __swab16(retval); > + } > +#endif /* CONFIG_CPU_ENDIAN_BE8 */ > + return retval; > +} > + > +static inline void write_section_u16(Elf32_Shdr *dstsec, unsigned > long loc, u16 value) > +{ > +#ifdef CONFIG_CPU_ENDIAN_BE8 > + if (dstsec->sh_flags& SHF_EXECINSTR) { > + value = __swab16(value); > + } > +#endif /* CONFIG_CPU_ENDIAN_BE8 */ > + *(u16 *) loc = value; > +} > +#endif /* CONFIG_THUMB2_KERNEL */ > + > int > apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex, > unsigned int relindex, struct module *module) > @@ -60,6 +119,7 @@ apply_relocate(Elf32_Shdr *sechdrs, cons > Elf32_Sym *sym; > const char *symname; > s32 offset; > + u32 loc_u32; > #ifdef CONFIG_THUMB2_KERNEL > u32 upper, lower, sign, j1, j2; > #endif > @@ -95,7 +155,8 @@ apply_relocate(Elf32_Shdr *sechdrs, cons > case R_ARM_PC24: > case R_ARM_CALL: > case R_ARM_JUMP24: > - offset = (*(u32 *)loc& 0x00ffffff)<< 2; > + loc_u32 = read_section_u32(dstsec, loc); > + offset = (loc_u32& 0x00ffffff)<< 2; > if (offset& 0x02000000) > offset -= 0x04000000; > > @@ -112,27 +173,33 @@ apply_relocate(Elf32_Shdr *sechdrs, cons > > offset>>= 2; > > - *(u32 *)loc&= 0xff000000; > - *(u32 *)loc |= offset& 0x00ffffff; > + loc_u32&= 0xff000000; > + loc_u32 |= offset& 0x00ffffff; > + write_section_u32(dstsec, loc, loc_u32); > break; > > - case R_ARM_V4BX: > + case R_ARM_V4BX: > + loc_u32 = read_section_u32(dstsec, loc); > /* Preserve Rm and the condition code. Alter > * other bits to re-code instruction as > * MOV PC,Rm. > */ > - *(u32 *)loc&= 0xf000000f; > - *(u32 *)loc |= 0x01a0f000; > + loc_u32&= 0xf000000f; > + loc_u32 |= 0x01a0f000; > + write_section_u32(dstsec, loc, loc_u32); > break; > > case R_ARM_PREL31: > - offset = *(u32 *)loc + sym->st_value - loc; > - *(u32 *)loc = offset& 0x7fffffff; > + loc_u32 = read_section_u32(dstsec, loc); > + offset = loc_u32 + sym->st_value - loc; > + loc_u32 = offset& 0x7fffffff; > + write_section_u32(dstsec, loc, loc_u32); > break; > > case R_ARM_MOVW_ABS_NC: > case R_ARM_MOVT_ABS: > - offset = *(u32 *)loc; > + loc_u32 = read_section_u32(dstsec, loc); > + offset = loc_u32; > offset = ((offset& 0xf0000)>> 4) | (offset& 0xfff); > offset = (offset ^ 0x8000) - 0x8000; > > @@ -140,16 +207,17 @@ apply_relocate(Elf32_Shdr *sechdrs, cons > if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS) > offset>>= 16; > > - *(u32 *)loc&= 0xfff0f000; > - *(u32 *)loc |= ((offset& 0xf000)<< 4) | > - (offset& 0x0fff); > + loc_u32&= 0xfff0f000; > + loc_u32 |= ((offset& 0xf000)<< 4) | > + (offset& 0x0fff); > + write_section_u32(dstsec, loc, loc_u32); > break; > > #ifdef CONFIG_THUMB2_KERNEL > case R_ARM_THM_CALL: > case R_ARM_THM_JUMP24: > - upper = *(u16 *)loc; > - lower = *(u16 *)(loc + 2); > + upper = read_section_u16(dstsec, loc); > + lower = read_section_u16(dstsec, loc + 2); > > /* > * 25 bit signed address range (Thumb-2 BL and B.W > @@ -198,17 +266,19 @@ apply_relocate(Elf32_Shdr *sechdrs, cons > sign = (offset>> 24)& 1; > j1 = sign ^ (~(offset>> 23)& 1); > j2 = sign ^ (~(offset>> 22)& 1); > - *(u16 *)loc = (u16)((upper& 0xf800) | (sign<< 10) | > - ((offset>> 12)& 0x03ff)); > - *(u16 *)(loc + 2) = (u16)((lower& 0xd000) | > - (j1<< 13) | (j2<< 11) | > - ((offset>> 1)& 0x07ff)); > + write_section_u16(dstsec, loc, > + (u16)((upper& 0xf800) | > (sign<< 10) | > + ((offset>> 12)& 0x03ff))); > + write_section_u16(dstsec, loc + 2, > + (u16)((lower& 0xd000) | > + (j1<< 13) | (j2<< 11) | > + ((offset>> 1)& 0x07ff))); > break; > > case R_ARM_THM_MOVW_ABS_NC: > case R_ARM_THM_MOVT_ABS: > - upper = *(u16 *)loc; > - lower = *(u16 *)(loc + 2); > + upper = read_section_u16(dstsec, loc); > + lower = read_section_u16(dstsec, loc + 2); > > /* > * MOVT/MOVW instructions encoding in Thumb-2: > @@ -229,12 +299,14 @@ apply_relocate(Elf32_Shdr *sechdrs, cons > if (ELF32_R_TYPE(rel->r_info) == R_ARM_THM_MOVT_ABS) > offset>>= 16; > > - *(u16 *)loc = (u16)((upper& 0xfbf0) | > - ((offset& 0xf000)>> 12) | > - ((offset& 0x0800)>> 1)); > - *(u16 *)(loc + 2) = (u16)((lower& 0x8f00) | > - ((offset& 0x0700)<< 4) | > - (offset& 0x00ff)); > + write_section_u16(dstsec, loc, > + (u16)((upper& 0xfbf0) | > + ((offset& 0xf000)>> 12) | > + ((offset& 0x0800)>> 1))); > + write_section_u16(dstsec, loc + 2, > + (u16)((lower& 0x8f00) | > + ((offset& 0x0700)<< 4) | > + (offset& 0x00ff))); > break; > #endif > I'm still not sure about this, I think the use of <asm/opcodes.h> is better here. Not sure if there's much point in checking the flags on the sections as I think the relocations pretty much define which endian they will be in. > Index: linux-linaro-tracking/arch/arm/mm/Kconfig > =================================================================== > --- linux-linaro-tracking.orig/arch/arm/mm/Kconfig > +++ linux-linaro-tracking/arch/arm/mm/Kconfig > @@ -695,6 +695,15 @@ config CPU_ENDIAN_BE32 > help > Support for the BE-32 (big-endian) mode on pre-ARMv6 processors. > > +config LITTLE_ENDIAN_LOADER > + bool > + depends on CPU_BIG_ENDIAN > + default y > + help > + If Linux should operate in big-endian mode, but bootloader > (i.e u-boot) > + is still in little endian mode and passes boot information in little > + endian form set this flag > + > config CPU_HIGH_VECTOR > depends on !MMU&& CPU_CP15&& !CPU_ARM740T > bool "Select the High exception vector" > Index: linux-linaro-tracking/arch/arm/mm/alignment.c > =================================================================== > --- linux-linaro-tracking.orig/arch/arm/mm/alignment.c > +++ linux-linaro-tracking/arch/arm/mm/alignment.c > @@ -792,6 +792,10 @@ do_alignment(unsigned long addr, unsigne > > regs->ARM_pc += isize; > > +#ifdef CONFIG_CPU_ENDIAN_BE8 > + instr = __swab32(instr); /* little endian instruction */ > +#endif /* CONFIG_CPU_ENDIAN_BE8 */ > + > switch (CODING_BITS(instr)) { > case 0x00000000: /* 3.13.4 load/store instruction extensions */ > if (LDSTHD_I_BIT(instr)) See comments on <asm/opcodes.h> > Index: linux-linaro-tracking/arch/arm/kernel/head.S > =================================================================== > --- linux-linaro-tracking.orig/arch/arm/kernel/head.S > +++ linux-linaro-tracking/arch/arm/kernel/head.S > @@ -89,6 +89,12 @@ ENTRY(stext) > @ ensure svc mode and all interrupts masked > safe_svcmode_maskall r9 > > +#ifdef CONFIG_LITTLE_ENDIAN_LOADER > + /* > + * Switch to bigendian mode > + */ > + setend be > +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ > mrc p15, 0, r9, c0, c0 @ get processor id > bl __lookup_processor_type @ r5=procinfo r9=cpuid > movs r10, r5 @ invalid processor (r5=0)? > @@ -356,6 +362,12 @@ ENTRY(secondary_startup) > #endif > safe_svcmode_maskall r9 > > +#ifdef CONFIG_LITTLE_ENDIAN_LOADER > + /* > + * Switch to bigendian mode > + */ > + setend be > +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ > mrc p15, 0, r9, c0, c0 @ get processor id > bl __lookup_processor_type > movs r10, r5 @ invalid processor? > @@ -427,6 +439,9 @@ __enable_mmu: > #ifdef CONFIG_CPU_ICACHE_DISABLE > bic r0, r0, #CR_I > #endif > +#ifdef CONFIG_CPU_ENDIAN_BE8 > + orr r0, r0, #CR_EE @ big-endian page tables > +#endif > #ifdef CONFIG_ARM_LPAE > mov r5, #0 > mcrr p15, 0, r4, r5, c2 @ load TTBR0 > @@ -592,8 +607,14 @@ __fixup_a_pv_table: > b 2f > 1: add r7, r3 > ldrh ip, [r7, #2] > +#ifdef __ARMEB__ > + rev ip, ip @ byteswap instruction > +#endif /* __ARMEB__ */ > and ip, 0x8f00 > orr ip, r6 @ mask in offset bits 31-24 > +#ifdef __ARMEB__ > + rev ip, ip @ byteswap instruction > +#endif /* __ARMEB__ */ > strh ip, [r7, #2] > 2: cmp r4, r5 > ldrcc r7, [r4], #4 @ use branch for delay slot > @@ -602,8 +623,14 @@ __fixup_a_pv_table: > #else > b 2f > 1: ldr ip, [r7, r3] > +#ifdef __ARMEB__ > + rev ip, ip @ byteswap instruction > +#endif /* __ARMEB__ */ > bic ip, ip, #0x000000ff > orr ip, ip, r6 @ mask in offset bits 31-24 > +#ifdef __ARMEB__ > + rev ip, ip @ byteswap instruction > +#endif /* __ARMEB__ */ > str ip, [r7, r3] > 2: cmp r4, r5 > ldrcc r7, [r4], #4 @ use branch for delay slot Yeah, prefer my macros for doing this. > Index: linux-linaro-tracking/arch/arm/kernel/kprobes.c > =================================================================== > --- linux-linaro-tracking.orig/arch/arm/kernel/kprobes.c > +++ linux-linaro-tracking/arch/arm/kernel/kprobes.c > @@ -66,14 +66,24 @@ int __kprobes arch_prepare_kprobe(struct > if (is_wide_instruction(insn)) { > insn<<= 16; > insn |= ((u16 *)addr)[1]; > +#ifdef CONFIG_CPU_ENDIAN_BE8 > + insn = __swab32(insn); /* little endian instruction */ > +#endif > decode_insn = thumb32_kprobe_decode_insn; > - } else > - decode_insn = thumb16_kprobe_decode_insn; > + } else { > + decode_insn = thumb16_kprobe_decode_insn; > +#ifdef CONFIG_CPU_ENDIAN_BE8 > + insn = __swab16(insn); /* little endian instruction */ > +#endif > + } > #else /* !CONFIG_THUMB2_KERNEL */ > thumb = false; > if (addr& 0x3) > return -EINVAL; > insn = *p->addr; > +#ifdef CONFIG_CPU_ENDIAN_BE8 > + insn = __swab32(insn); /* little endian instruction */ > +#endif > decode_insn = arm_kprobe_decode_insn; > #endif > > @@ -89,7 +99,11 @@ int __kprobes arch_prepare_kprobe(struct > if (!p->ainsn.insn) > return -ENOMEM; > for (is = 0; is< MAX_INSN_SIZE; ++is) > +#ifndef CONFIG_CPU_ENDIAN_BE8 > p->ainsn.insn[is] = tmp_insn[is]; > +#else > + p->ainsn.insn[is] = __swab32(tmp_insn[is]); > /* little endian instruction */ > +#endif > flush_insns(p->ainsn.insn, > sizeof(p->ainsn.insn[0]) * MAX_INSN_SIZE); > p->ainsn.insn_fn = (kprobe_insn_fn_t *) > @@ -113,10 +127,17 @@ void __kprobes arch_arm_kprobe(struct kp > /* Remove any Thumb flag */ > addr = (void *)((uintptr_t)p->addr& ~1); > > - if (is_wide_instruction(p->opcode)) > + if (is_wide_instruction(p->opcode)) { > brkp = KPROBE_THUMB32_BREAKPOINT_INSTRUCTION; > - else > +#ifdef CONFIG_CPU_ENDIAN_BE8 > + brkp = __swab32(brkp); > +#endif /* CONFIG_CPU_ENDIAN_BE8 */ > + } else { > brkp = KPROBE_THUMB16_BREAKPOINT_INSTRUCTION; > +#ifdef CONFIG_CPU_ENDIAN_BE8 > + brkp = __swab16(brkp); > +#endif /* CONFIG_CPU_ENDIAN_BE8 */ > + } > } else { > kprobe_opcode_t insn = p->opcode; > > @@ -127,6 +148,10 @@ void __kprobes arch_arm_kprobe(struct kp > brkp |= 0xe0000000; /* Unconditional instruction */ > else > brkp |= insn& 0xf0000000; /* Copy condition from insn */ > +#ifdef CONFIG_CPU_ENDIAN_BE8 > + brkp = __swab32(brkp); > +#endif /* CONFIG_CPU_ENDIAN_BE8 */ > + > } > > patch_text(addr, brkp); > @@ -144,8 +169,12 @@ int __kprobes __arch_disarm_kprobe(void > { > struct kprobe *kp = p; > void *addr = (void *)((uintptr_t)kp->addr& ~1); > + kprobe_opcode_t insn = kp->opcode; > > - __patch_text(addr, kp->opcode); > +#ifdef CONFIG_CPU_ENDIAN_BE8 > + insn = __swab32(insn); /* little endian instruction */ > +#endif > + __patch_text(addr, insn); > > return 0; > __patch_text already does swapping, so a little concerned that it is being done twice in some places. > Index: linux-linaro-tracking/arch/arm/kernel/traps.c > =================================================================== > --- linux-linaro-tracking.orig/arch/arm/kernel/traps.c > +++ linux-linaro-tracking/arch/arm/kernel/traps.c > @@ -426,6 +426,10 @@ asmlinkage void __exception do_undefinst > goto die_sig; > } > > +#ifdef CONFIG_CPU_ENDIAN_BE8 > + instr = __swab32(instr); /* little endian instruction */ > +#endif /* CONFIG_CPU_ENDIAN_BE8 */ > + > if (call_undef_hook(regs, instr) == 0) > return; already fixed. > Index: linux-linaro-tracking/arch/arm/kernel/ftrace.c > =================================================================== > --- linux-linaro-tracking.orig/arch/arm/kernel/ftrace.c > +++ linux-linaro-tracking/arch/arm/kernel/ftrace.c > @@ -100,10 +100,18 @@ static int ftrace_modify_code(unsigned l > if (probe_kernel_read(&replaced, (void *)pc, MCOUNT_INSN_SIZE)) > return -EFAULT; > > +#ifdef CONFIG_CPU_ENDIAN_BE8 > + replaced = __swab32(replaced); /* little endian instruction */ > +#endif /* CONFIG_CPU_ENDIAN_BE8 */ > + > if (replaced != old) > return -EINVAL; > } > > +#ifdef CONFIG_CPU_ENDIAN_BE8 > + new = __swab32(new); /* little endian instruction */ > +#endif /* CONFIG_CPU_ENDIAN_BE8 */ > + > if (probe_kernel_write((void *)pc,&new, MCOUNT_INSN_SIZE)) > return -EPERM; Please see <asm/opcodes.h> for dealing with the op-codes like this. > Index: linux-linaro-tracking/arch/arm/kernel/atags_parse.c > =================================================================== > --- linux-linaro-tracking.orig/arch/arm/kernel/atags_parse.c > +++ linux-linaro-tracking/arch/arm/kernel/atags_parse.c > @@ -216,6 +216,10 @@ struct machine_desc * __init setup_machi > if (tags->hdr.tag != ATAG_CORE) > convert_to_tag_list(tags); > #endif > +#ifdef CONFIG_LITTLE_ENDIAN_LOADER > + byteswap_tags(tags); > +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ > + > if (tags->hdr.tag != ATAG_CORE) { > early_print("Warning: Neither atags nor dtb found\n"); > tags = (struct tag *)&default_tags; > Index: linux-linaro-tracking/arch/arm/kernel/sleep.S I really don't like this way, it makes it difficult to deal with if you export this data to user-space to then do things like md5sum it. Also, if you're going to kexec() a new kernel, would you have to re-swap it back before this? > Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h > =================================================================== > --- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h > +++ linux-linaro-tracking/arch/arm/include/asm/atomic.h > @@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a > > __asm__ __volatile__("@ atomic64_add\n" > "1: ldrexd %0, %H0, [%3]\n" > +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ > " adds %0, %0, %4\n" > " adc %H0, %H0, %H4\n" > +#else > +" adds %H0, %H0, %H4\n" > +" adc %0, %0, %4\n" > +#endif > " strexd %1, %0, %H0, [%3]\n" > " teq %1, #0\n" > " bne 1b" > @@ -320,8 +325,13 @@ static inline u64 atomic64_add_return(u6 > > __asm__ __volatile__("@ atomic64_add_return\n" > "1: ldrexd %0, %H0, [%3]\n" > +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ > " adds %0, %0, %4\n" > " adc %H0, %H0, %H4\n" > +#else > +" adds %H0, %H0, %H4\n" > +" adc %0, %0, %4\n" > +#endif > " strexd %1, %0, %H0, [%3]\n" > " teq %1, #0\n" > " bne 1b" > @@ -341,8 +351,13 @@ static inline void atomic64_sub(u64 i, a > > __asm__ __volatile__("@ atomic64_sub\n" > "1: ldrexd %0, %H0, [%3]\n" > +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ > " subs %0, %0, %4\n" > " sbc %H0, %H0, %H4\n" > +#else > +" subs %H0, %H0, %H4\n" > +" sbc %0, %0, %4\n" > +#endif > " strexd %1, %0, %H0, [%3]\n" > " teq %1, #0\n" > " bne 1b" > @@ -360,8 +375,13 @@ static inline u64 atomic64_sub_return(u6 > > __asm__ __volatile__("@ atomic64_sub_return\n" > "1: ldrexd %0, %H0, [%3]\n" > +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ > " subs %0, %0, %4\n" > " sbc %H0, %H0, %H4\n" > +#else > +" subs %H0, %H0, %H4\n" > +" sbc %0, %0, %4\n" > +#endif > " strexd %1, %0, %H0, [%3]\n" > " teq %1, #0\n" > " bne 1b" > @@ -428,9 +448,15 @@ static inline u64 atomic64_dec_if_positi > > __asm__ __volatile__("@ atomic64_dec_if_positive\n" > "1: ldrexd %0, %H0, [%3]\n" > +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ > " subs %0, %0, #1\n" > " sbc %H0, %H0, #0\n" > " teq %H0, #0\n" > +#else > +" subs %H0, %H0, #1\n" > +" sbc %0, %0, #0\n" > +" teq %0, #0\n" > +#endif > " bmi 2f\n" > " strexd %1, %0, %H0, [%3]\n" > " teq %1, #0\n" > @@ -459,8 +485,13 @@ static inline int atomic64_add_unless(at > " teqeq %H0, %H5\n" > " moveq %1, #0\n" > " beq 2f\n" > +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ > " adds %0, %0, %6\n" > " adc %H0, %H0, %H6\n" > +#else > +" adds %H0, %H0, %H6\n" > +" adc %0, %0, %6\n" > +#endif > " strexd %2, %0, %H0, [%4]\n" > " teq %2, #0\n" > " bne 1b\n" I still believe that you're doing the wrong thing here and that these can be left well enough alone. Please give a concrete piece of code that can show they're actually doing the wrong thing. -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius ^ permalink raw reply [flat|nested] 13+ messages in thread
* updates for be8 patch series 2013-07-23 18:30 ` Ben Dooks @ 2013-07-24 1:06 ` Victor Kamensky 2013-07-24 9:36 ` Will Deacon 2013-07-24 10:46 ` Ben Dooks 2013-07-24 7:31 ` Victor Kamensky 1 sibling, 2 replies; 13+ messages in thread From: Victor Kamensky @ 2013-07-24 1:06 UTC (permalink / raw) To: linux-arm-kernel Hi Ben, Let me split off atomic64 from other issues, so we have focused topic here. Please see atomic64 issue test case below. Please try it in your setup. I've run on Pandaboard ES with my set of patches, but I believe it should be the same in your case. <snip> >> Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h >> =================================================================== >> --- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h >> +++ linux-linaro-tracking/arch/arm/include/asm/atomic.h >> @@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a >> >> __asm__ __volatile__("@ atomic64_add\n" >> "1: ldrexd %0, %H0, [%3]\n" >> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >> " adds %0, %0, %4\n" >> " adc %H0, %H0, %H4\n" >> +#else >> +" adds %H0, %H0, %H4\n" >> +" adc %0, %0, %4\n" >> +#endif >> " strexd %1, %0, %H0, [%3]\n" >> " teq %1, #0\n" >> " bne 1b" >> @@ -320,8 +325,13 @@ static inline u64 atomic64_add_return(u6 >> >> __asm__ __volatile__("@ atomic64_add_return\n" >> "1: ldrexd %0, %H0, [%3]\n" >> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >> " adds %0, %0, %4\n" >> " adc %H0, %H0, %H4\n" >> +#else >> +" adds %H0, %H0, %H4\n" >> +" adc %0, %0, %4\n" >> +#endif >> " strexd %1, %0, %H0, [%3]\n" >> " teq %1, #0\n" >> " bne 1b" >> @@ -341,8 +351,13 @@ static inline void atomic64_sub(u64 i, a >> >> __asm__ __volatile__("@ atomic64_sub\n" >> "1: ldrexd %0, %H0, [%3]\n" >> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >> " subs %0, %0, %4\n" >> " sbc %H0, %H0, %H4\n" >> +#else >> +" subs %H0, %H0, %H4\n" >> +" sbc %0, %0, %4\n" >> +#endif >> " strexd %1, %0, %H0, [%3]\n" >> " teq %1, #0\n" >> " bne 1b" >> @@ -360,8 +375,13 @@ static inline u64 atomic64_sub_return(u6 >> >> __asm__ __volatile__("@ atomic64_sub_return\n" >> "1: ldrexd %0, %H0, [%3]\n" >> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >> " subs %0, %0, %4\n" >> " sbc %H0, %H0, %H4\n" >> +#else >> +" subs %H0, %H0, %H4\n" >> +" sbc %0, %0, %4\n" >> +#endif >> " strexd %1, %0, %H0, [%3]\n" >> " teq %1, #0\n" >> " bne 1b" >> @@ -428,9 +448,15 @@ static inline u64 atomic64_dec_if_positi >> >> __asm__ __volatile__("@ atomic64_dec_if_positive\n" >> "1: ldrexd %0, %H0, [%3]\n" >> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >> " subs %0, %0, #1\n" >> " sbc %H0, %H0, #0\n" >> " teq %H0, #0\n" >> +#else >> +" subs %H0, %H0, #1\n" >> +" sbc %0, %0, #0\n" >> +" teq %0, #0\n" >> +#endif >> " bmi 2f\n" >> " strexd %1, %0, %H0, [%3]\n" >> " teq %1, #0\n" >> @@ -459,8 +485,13 @@ static inline int atomic64_add_unless(at >> " teqeq %H0, %H5\n" >> " moveq %1, #0\n" >> " beq 2f\n" >> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >> " adds %0, %0, %6\n" >> " adc %H0, %H0, %H6\n" >> +#else >> +" adds %H0, %H0, %H6\n" >> +" adc %0, %0, %6\n" >> +#endif >> " strexd %2, %0, %H0, [%4]\n" >> " teq %2, #0\n" >> " bne 1b\n" > > > I still believe that you're doing the wrong thing here > and that these can be left well enough alone. Please give > a concrete piece of code that can show they're actually > doing the wrong thing. Disable CONFIG_GENERIC_ATOMIC64 ------------------------------- Make sure that CONFIG_GENERIC_ATOMIC64 is not set. Otherwise atomic64_add from lib/atomic64.c will be used and that one works correctly but this case does not use atomic64_XXX inline functions from arch/arm/include/asm/atomic.h Personally with my current kernel close to 3.10 I failed to do that and manage to get live kernel, so to illustrate the issue I will use my own copy of inline atomic64_add function. But effectively it will show the same problem that I tried to report. Compiler used ------------- gcc-linaro-arm-linux-gnueabihf-4.7-2013.01-20130125_linux Test module source ------------------ #include <linux/module.h> #include <linux/moduleparam.h> #include <linux/atomic.h> static inline void my_atomic64_add_original(u64 i, atomic64_t *v) { u64 result; unsigned long tmp; __asm__ __volatile__("@ atomic64_add\n" "1: ldrexd %0, %H0, [%3]\n" " adds %0, %0, %4\n" " adc %H0, %H0, %H4\n" " strexd %1, %0, %H0, [%3]\n" " teq %1, #0\n" " bne 1b" : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter) : "r" (&v->counter), "r" (i) : "cc"); } static inline void my_atomic64_add_fixed(u64 i, atomic64_t *v) { u64 result; unsigned long tmp; __asm__ __volatile__("@ atomic64_add\n" "1: ldrexd %0, %H0, [%3]\n" #ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ " adds %0, %0, %4\n" " adc %H0, %H0, %H4\n" #else " adds %H0, %H0, %H4\n" " adc %0, %0, %4\n" #endif " strexd %1, %0, %H0, [%3]\n" " teq %1, #0\n" " bne 1b" : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter) : "r" (&v->counter), "r" (i) : "cc"); } atomic64_t a = {0}; #define INCREMENT 0x40000000 void atomic_update_original (void) { my_atomic64_add_original(INCREMENT, &a); } void atomic_update_fixed (void) { my_atomic64_add_fixed(INCREMENT, &a); } void test_atomic64_original(void) { int i; printk("original atomic64_add\n"); atomic64_set(&a, 0); printk("a = %llx\n", a.counter); for (i = 0; i < 10; i++) { atomic_update_original(); printk("i = %2d: a = %llx\n", i, a.counter); } } void test_atomic64_fixed(void) { int i; printk("fixed atomic64_add\n"); atomic64_set(&a, 0); printk("a = %llx\n", a.counter); for (i = 0; i < 10; i++) { atomic_update_fixed(); printk("i = %2d: a = %llx\n", i, a.counter); } } int init_module (void) { test_atomic64_original(); test_atomic64_fixed(); return 0; } void cleanup_module(void) { } MODULE_LICENSE("GPL"); Test run -------- Compile and run module, observe in original code counter is not incremented correctly. root@genericarmv7ab:~# insmod atomic64.ko [ 73.041107] original atomic64_add [ 73.044647] a = 0 [ 73.046691] i = 0: a = 40000000 [ 73.050659] i = 1: a = 80000000 [ 73.054199] i = 2: a = c0000000 [ 73.057983] i = 3: a = 0 [ 73.060852] i = 4: a = 40000000 [ 73.064361] i = 5: a = 80000000 [ 73.067932] i = 6: a = c0000000 [ 73.071441] i = 7: a = 0 [ 73.074310] i = 8: a = 40000000 [ 73.077850] i = 9: a = 80000000 [ 73.081390] fixed atomic64_add [ 73.084625] a = 0 [ 73.086669] i = 0: a = 40000000 [ 73.090209] i = 1: a = 80000000 [ 73.093749] i = 2: a = c0000000 [ 73.097290] i = 3: a = 100000000 [ 73.100891] i = 4: a = 140000000 [ 73.104522] i = 5: a = 180000000 [ 73.108154] i = 6: a = 1c0000000 [ 73.111755] i = 7: a = 200000000 [ 73.115386] i = 8: a = 240000000 [ 73.119018] i = 9: a = 280000000 Disassemble of original code ---------------------------- (gdb) disassemble atomic_update_original Dump of assembler code for function atomic_update_original: 0x00000000 <+0>: push {r4} ; (str r4, [sp, #-4]!) 0x00000004 <+4>: mov r3, #1073741824 ; 0x40000000 0x00000008 <+8>: ldr r12, [pc, #32] ; 0x30 <atomic_update_original+48> 0x0000000c <+12>: mov r2, #0 0x00000010 <+16>: ldrexd r0, [r12] 0x00000014 <+20>: adds r0, r0, r2 0x00000018 <+24>: adc r1, r1, r3 0x0000001c <+28>: strexd r4, r0, [r12] 0x00000020 <+32>: teq r4, #0 0x00000024 <+36>: bne 0x10 <atomic_update_original+16> 0x00000028 <+40>: ldmfd sp!, {r4} 0x0000002c <+44>: bx lr 0x00000030 <+48>: andeq r0, r0, r0 End of assembler dump. Disassemble of fixed code ------------------------- (gdb) disassemble atomic_update_fixed Dump of assembler code for function atomic_update_fixed: 0x00000034 <+0>: push {r4} ; (str r4, [sp, #-4]!) 0x00000038 <+4>: mov r3, #1073741824 ; 0x40000000 0x0000003c <+8>: ldr r12, [pc, #32] ; 0x64 <atomic_update_fixed+48> 0x00000040 <+12>: mov r2, #0 0x00000044 <+16>: ldrexd r0, [r12] 0x00000048 <+20>: adds r1, r1, r3 0x0000004c <+24>: adc r0, r0, r2 0x00000050 <+28>: strexd r4, r0, [r12] 0x00000054 <+32>: teq r4, #0 0x00000058 <+36>: bne 0x44 <atomic_update_fixed+16> 0x0000005c <+40>: ldmfd sp!, {r4} 0x00000060 <+44>: bx lr 0x00000064 <+48>: andeq r0, r0, r0 End of assembler dump. Fixed code explanation ---------------------- - $r2 has 4 most significant bytes of increement (0) - $r3 has 4 least significant bytes of increement (0x40000000) - Load from address at $r12 'long long' into r0 and r1 $r0 will get 4 most significant bytes (i.e 4 bytes at $r12) $r1 will get 4 least significan bytes (i.e 4 bytes at $r12+4) loads are big endian, so $r0 and $r0 will be read with proper be swap 0x00000044 <+16>: ldrexd r0, [r12] - add $r3 to $r1 store result into $r1, carry flag could be set - adding 4 least sginificant bytes of 'long long' 0x00000048 <+20>: adds r1, r1, r3 - adding with carry most siginificant parts of increment and counter 0x0000004c <+24>: adc r0, r0, r2 - Store result back $r0 will to $r12 address $r1 will to $r12+4 address 0x00000050 <+28>: strexd r4, r0, [r12] Ldrd/strd --------- Issue boils down to the fact that ldrexd/ldrd and strexd/strd instrucitons in big endian mode will do byteswap only within 4 bytes boundary, but it will not swap registers numbers itself. I.e ldrexd rN, <addr> puts swapped 4 bytes at <addr> address into rN, and it puts swapped 4 bytes at <addr+4> into rN+1 Compiler? --------- One may thing whether we have compiler bug here. And after all, what is the meaning of %0 vs %H0? I asked my local gcc expert for the help. Here what I got. Please look at https://github.com/mirrors/gcc/blob/master/gcc/config/arm/arm.c and search for >'Q', 'R' and 'H'< according to this page '%HN' is always reg+1 operand regardless whether it is big endian system or not. It is opposite to 'Q'. Better fix ---------- In fact now I believe correct fix should use 'Q' for least siginificant 4 bytes, 'R' for most siginificant 4 bytes and 'H' is used in load/store instructions. It should look something like this: static inline void my_atomic64_add_fixed1(u64 i, atomic64_t *v) { u64 result; unsigned long tmp; __asm__ __volatile__("@ atomic64_add\n" "1: ldrexd %0, %H0, [%3]\n" " adds %Q0, %Q0, %Q4\n" " adc %R0, %R0, %R4\n" " strexd %1, %0, %H0, [%3]\n" " teq %1, #0\n" " bne 1b" : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter) : "r" (&v->counter), "r" (i) : "cc"); } It is better that my original fix because it does not have conditional compilation. I've tested it in the module, it works on both LE and BE kernel variants. Thanks, Victor ^ permalink raw reply [flat|nested] 13+ messages in thread
* updates for be8 patch series 2013-07-24 1:06 ` Victor Kamensky @ 2013-07-24 9:36 ` Will Deacon 2013-07-24 10:46 ` Ben Dooks 1 sibling, 0 replies; 13+ messages in thread From: Will Deacon @ 2013-07-24 9:36 UTC (permalink / raw) To: linux-arm-kernel On Wed, Jul 24, 2013 at 02:06:05AM +0100, Victor Kamensky wrote: > Hi Ben, > > Let me split off atomic64 from other issues, so we have focused topic here. > > Please see atomic64 issue test case below. Please try it in your setup. I've > run on Pandaboard ES with my set of patches, but I believe it should be the > same in your case. [...] > <snip> > > >> Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h > >> =================================================================== > >> --- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h > >> +++ linux-linaro-tracking/arch/arm/include/asm/atomic.h > >> @@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a > >> > >> __asm__ __volatile__("@ atomic64_add\n" > >> "1: ldrexd %0, %H0, [%3]\n" > >> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ > >> " adds %0, %0, %4\n" > >> " adc %H0, %H0, %H4\n" > >> +#else > >> +" adds %H0, %H0, %H4\n" > >> +" adc %0, %0, %4\n" > >> +#endif I thought you could just use the %Q and %R output modifiers to get the appropriate halves of the 64-bit operand, then you can avoid the #ifdef. Will ^ permalink raw reply [flat|nested] 13+ messages in thread
* updates for be8 patch series 2013-07-24 1:06 ` Victor Kamensky 2013-07-24 9:36 ` Will Deacon @ 2013-07-24 10:46 ` Ben Dooks 1 sibling, 0 replies; 13+ messages in thread From: Ben Dooks @ 2013-07-24 10:46 UTC (permalink / raw) To: linux-arm-kernel On 24/07/13 02:06, Victor Kamensky wrote: > Hi Ben, > > Let me split off atomic64 from other issues, so we have focused topic here. > > Please see atomic64 issue test case below. Please try it in your setup. I've > run on Pandaboard ES with my set of patches, but I believe it should be the > same in your case. > > <snip> > >>> Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h >>> =================================================================== >>> --- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h >>> +++ linux-linaro-tracking/arch/arm/include/asm/atomic.h >>> @@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a >>> >>> __asm__ __volatile__("@ atomic64_add\n" >>> "1: ldrexd %0, %H0, [%3]\n" >>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >>> " adds %0, %0, %4\n" >>> " adc %H0, %H0, %H4\n" >>> +#else >>> +" adds %H0, %H0, %H4\n" >>> +" adc %0, %0, %4\n" >>> +#endif >>> " strexd %1, %0, %H0, [%3]\n" >>> " teq %1, #0\n" >>> " bne 1b" >>> @@ -320,8 +325,13 @@ static inline u64 atomic64_add_return(u6 >>> >>> __asm__ __volatile__("@ atomic64_add_return\n" >>> "1: ldrexd %0, %H0, [%3]\n" >>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >>> " adds %0, %0, %4\n" >>> " adc %H0, %H0, %H4\n" >>> +#else >>> +" adds %H0, %H0, %H4\n" >>> +" adc %0, %0, %4\n" >>> +#endif >>> " strexd %1, %0, %H0, [%3]\n" >>> " teq %1, #0\n" >>> " bne 1b" >>> @@ -341,8 +351,13 @@ static inline void atomic64_sub(u64 i, a >>> >>> __asm__ __volatile__("@ atomic64_sub\n" >>> "1: ldrexd %0, %H0, [%3]\n" >>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >>> " subs %0, %0, %4\n" >>> " sbc %H0, %H0, %H4\n" >>> +#else >>> +" subs %H0, %H0, %H4\n" >>> +" sbc %0, %0, %4\n" >>> +#endif >>> " strexd %1, %0, %H0, [%3]\n" >>> " teq %1, #0\n" >>> " bne 1b" >>> @@ -360,8 +375,13 @@ static inline u64 atomic64_sub_return(u6 >>> >>> __asm__ __volatile__("@ atomic64_sub_return\n" >>> "1: ldrexd %0, %H0, [%3]\n" >>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >>> " subs %0, %0, %4\n" >>> " sbc %H0, %H0, %H4\n" >>> +#else >>> +" subs %H0, %H0, %H4\n" >>> +" sbc %0, %0, %4\n" >>> +#endif >>> " strexd %1, %0, %H0, [%3]\n" >>> " teq %1, #0\n" >>> " bne 1b" >>> @@ -428,9 +448,15 @@ static inline u64 atomic64_dec_if_positi >>> >>> __asm__ __volatile__("@ atomic64_dec_if_positive\n" >>> "1: ldrexd %0, %H0, [%3]\n" >>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >>> " subs %0, %0, #1\n" >>> " sbc %H0, %H0, #0\n" >>> " teq %H0, #0\n" >>> +#else >>> +" subs %H0, %H0, #1\n" >>> +" sbc %0, %0, #0\n" >>> +" teq %0, #0\n" >>> +#endif >>> " bmi 2f\n" >>> " strexd %1, %0, %H0, [%3]\n" >>> " teq %1, #0\n" >>> @@ -459,8 +485,13 @@ static inline int atomic64_add_unless(at >>> " teqeq %H0, %H5\n" >>> " moveq %1, #0\n" >>> " beq 2f\n" >>> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >>> " adds %0, %0, %6\n" >>> " adc %H0, %H0, %H6\n" >>> +#else >>> +" adds %H0, %H0, %H6\n" >>> +" adc %0, %0, %6\n" >>> +#endif >>> " strexd %2, %0, %H0, [%4]\n" >>> " teq %2, #0\n" >>> " bne 1b\n" >> >> >> I still believe that you're doing the wrong thing here >> and that these can be left well enough alone. Please give >> a concrete piece of code that can show they're actually >> doing the wrong thing. > > Disable CONFIG_GENERIC_ATOMIC64 > ------------------------------- > > Make sure that CONFIG_GENERIC_ATOMIC64 is not set. Otherwise atomic64_add from > lib/atomic64.c will be used and that one works correctly but this case does > not use atomic64_XXX inline functions from arch/arm/include/asm/atomic.h > > Personally with my current kernel close to 3.10 I failed to do that and manage > to get live kernel, so to illustrate the issue I will use my own copy of > inline atomic64_add function. But effectively it will show the same problem > that I tried to report. the kconfig has: select GENERIC_ATOMIC64 if (CPU_V7M || CPU_V6 || !CPU_32v6K || !AEABI) which means that it is possible that the generic one is being used. > Compiler used > ------------- > > gcc-linaro-arm-linux-gnueabihf-4.7-2013.01-20130125_linux > > Test module source > ------------------ > > #include<linux/module.h> > #include<linux/moduleparam.h> > #include<linux/atomic.h> > > static inline void my_atomic64_add_original(u64 i, atomic64_t *v) > { > u64 result; > unsigned long tmp; > > __asm__ __volatile__("@ atomic64_add\n" > "1: ldrexd %0, %H0, [%3]\n" > " adds %0, %0, %4\n" > " adc %H0, %H0, %H4\n" > " strexd %1, %0, %H0, [%3]\n" > " teq %1, #0\n" > " bne 1b" > : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter) > : "r" (&v->counter), "r" (i) > : "cc"); > } > > static inline void my_atomic64_add_fixed(u64 i, atomic64_t *v) > { > u64 result; > unsigned long tmp; > > __asm__ __volatile__("@ atomic64_add\n" > "1: ldrexd %0, %H0, [%3]\n" > #ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ > " adds %0, %0, %4\n" > " adc %H0, %H0, %H4\n" > #else > " adds %H0, %H0, %H4\n" > " adc %0, %0, %4\n" > #endif > " strexd %1, %0, %H0, [%3]\n" > " teq %1, #0\n" > " bne 1b" > : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter) > : "r" (&v->counter), "r" (i) > : "cc"); > } > > > atomic64_t a = {0}; > > #define INCREMENT 0x40000000 > > void atomic_update_original (void) > { > my_atomic64_add_original(INCREMENT,&a); > } > > void atomic_update_fixed (void) > { > my_atomic64_add_fixed(INCREMENT,&a); > } > > void test_atomic64_original(void) > { > int i; > > printk("original atomic64_add\n"); > > atomic64_set(&a, 0); > > printk("a = %llx\n", a.counter); > > for (i = 0; i< 10; i++) { > atomic_update_original(); > printk("i = %2d: a = %llx\n", i, a.counter); > } > } > > void test_atomic64_fixed(void) > { > int i; > > printk("fixed atomic64_add\n"); > > atomic64_set(&a, 0); > > printk("a = %llx\n", a.counter); > > for (i = 0; i< 10; i++) { > atomic_update_fixed(); > printk("i = %2d: a = %llx\n", i, a.counter); > } > } > > int > init_module (void) > { > test_atomic64_original(); > test_atomic64_fixed(); > return 0; > } > > void > cleanup_module(void) > { > } > > MODULE_LICENSE("GPL"); > > > Test run > -------- > > Compile and run module, observe in original code counter is not incremented > correctly. > > root at genericarmv7ab:~# insmod atomic64.ko > [ 73.041107] original atomic64_add > [ 73.044647] a = 0 > [ 73.046691] i = 0: a = 40000000 > [ 73.050659] i = 1: a = 80000000 > [ 73.054199] i = 2: a = c0000000 > [ 73.057983] i = 3: a = 0 > [ 73.060852] i = 4: a = 40000000 > [ 73.064361] i = 5: a = 80000000 > [ 73.067932] i = 6: a = c0000000 > [ 73.071441] i = 7: a = 0 > [ 73.074310] i = 8: a = 40000000 > [ 73.077850] i = 9: a = 80000000 > [ 73.081390] fixed atomic64_add > [ 73.084625] a = 0 > [ 73.086669] i = 0: a = 40000000 > [ 73.090209] i = 1: a = 80000000 > [ 73.093749] i = 2: a = c0000000 > [ 73.097290] i = 3: a = 100000000 > [ 73.100891] i = 4: a = 140000000 > [ 73.104522] i = 5: a = 180000000 > [ 73.108154] i = 6: a = 1c0000000 > [ 73.111755] i = 7: a = 200000000 > [ 73.115386] i = 8: a = 240000000 > [ 73.119018] i = 9: a = 280000000 > > > Disassemble of original code > ---------------------------- > > (gdb) disassemble atomic_update_original > Dump of assembler code for function atomic_update_original: > 0x00000000<+0>: push {r4} ; (str r4, [sp, #-4]!) > 0x00000004<+4>: mov r3, #1073741824 ; 0x40000000 > 0x00000008<+8>: ldr r12, [pc, #32] ; 0x30 > <atomic_update_original+48> > 0x0000000c<+12>: mov r2, #0 > 0x00000010<+16>: ldrexd r0, [r12] > 0x00000014<+20>: adds r0, r0, r2 > 0x00000018<+24>: adc r1, r1, r3 > 0x0000001c<+28>: strexd r4, r0, [r12] > 0x00000020<+32>: teq r4, #0 > 0x00000024<+36>: bne 0x10<atomic_update_original+16> > 0x00000028<+40>: ldmfd sp!, {r4} > 0x0000002c<+44>: bx lr > 0x00000030<+48>: andeq r0, r0, r0 > End of assembler dump. > > Disassemble of fixed code > ------------------------- > > (gdb) disassemble atomic_update_fixed > Dump of assembler code for function atomic_update_fixed: > 0x00000034<+0>: push {r4} ; (str r4, [sp, #-4]!) > 0x00000038<+4>: mov r3, #1073741824 ; 0x40000000 > 0x0000003c<+8>: ldr r12, [pc, #32] ; 0x64<atomic_update_fixed+48> > 0x00000040<+12>: mov r2, #0 > 0x00000044<+16>: ldrexd r0, [r12] > 0x00000048<+20>: adds r1, r1, r3 > 0x0000004c<+24>: adc r0, r0, r2 > 0x00000050<+28>: strexd r4, r0, [r12] > 0x00000054<+32>: teq r4, #0 > 0x00000058<+36>: bne 0x44<atomic_update_fixed+16> > 0x0000005c<+40>: ldmfd sp!, {r4} > 0x00000060<+44>: bx lr > 0x00000064<+48>: andeq r0, r0, r0 > End of assembler dump. > > Fixed code explanation > ---------------------- > > - $r2 has 4 most significant bytes of increement (0) > - $r3 has 4 least significant bytes of increement (0x40000000) > > - Load from address at $r12 'long long' into r0 and r1 > $r0 will get 4 most significant bytes (i.e 4 bytes at $r12) > $r1 will get 4 least significan bytes (i.e 4 bytes at $r12+4) > loads are big endian, so $r0 and $r0 will be read with proper be swap > > 0x00000044<+16>: ldrexd r0, [r12] > > - add $r3 to $r1 store result into $r1, carry flag could be set > - adding 4 least sginificant bytes of 'long long' > > 0x00000048<+20>: adds r1, r1, r3 > > - adding with carry most siginificant parts of increment and counter > > 0x0000004c<+24>: adc r0, r0, r2 > > - Store result back > $r0 will to $r12 address > $r1 will to $r12+4 address > > 0x00000050<+28>: strexd r4, r0, [r12] > > Ldrd/strd > --------- > > Issue boils down to the fact that ldrexd/ldrd and strexd/strd instrucitons > in big endian mode will do byteswap only within 4 bytes boundary, but it will > not swap registers numbers itself. I.e ldrexd rN,<addr> puts swapped 4 bytes > at<addr> address into rN, and it puts swapped 4 bytes at<addr+4> into rN+1 Looking at the ARM ARM, the following is LDRD. if ConditionPassed() then EncodingSpecificOperations(); NullCheckIfThumbEE(15); address = if add then (Align(PC,4) + imm32) else (Align(PC,4) - imm32); if HaveLPAE() && address<2:0> == '000' then data = MemA[Address,8]; if BigEndian() then R[t] = data<63:32>; R[t2] = data<31:0>; else R[t] = data<31:0>; R[t2] = data<63:32>; else R[t] = MemA[address,4]; R[t2] = MemA[address+4,4]; I thought it always had R[t] as lowest and R[t2] as the highest. > Compiler? > --------- > > One may thing whether we have compiler bug here. And after all, what is the > meaning of %0 vs %H0? I asked my local gcc expert for the help. Here what > I got. Please look at > > https://github.com/mirrors/gcc/blob/master/gcc/config/arm/arm.c > > and search for>'Q', 'R' and 'H'< > > according to this page '%HN' is always reg+1 operand regardless whether it is > big endian system or not. It is opposite to 'Q'. > > Better fix > ---------- > > In fact now I believe correct fix should use 'Q' for least siginificant 4 > bytes, 'R' for most siginificant 4 bytes and 'H' is used in load/store > instructions. > > It should look something like this: > > static inline void my_atomic64_add_fixed1(u64 i, atomic64_t *v) > { > u64 result; > unsigned long tmp; > > __asm__ __volatile__("@ atomic64_add\n" > "1: ldrexd %0, %H0, [%3]\n" > " adds %Q0, %Q0, %Q4\n" > " adc %R0, %R0, %R4\n" > " strexd %1, %0, %H0, [%3]\n" > " teq %1, #0\n" > " bne 1b" > : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter) > : "r" (&v->counter), "r" (i) > : "cc"); > } > > It is better that my original fix because it does not have conditional > compilation. I've tested it in the module, it works on both LE and BE kernel > variants. If you want to make a patch with this in, I can add it to my series ready to get merged. -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius ^ permalink raw reply [flat|nested] 13+ messages in thread
* updates for be8 patch series 2013-07-23 18:30 ` Ben Dooks 2013-07-24 1:06 ` Victor Kamensky @ 2013-07-24 7:31 ` Victor Kamensky 2013-07-24 21:24 ` Victor Kamensky 1 sibling, 1 reply; 13+ messages in thread From: Victor Kamensky @ 2013-07-24 7:31 UTC (permalink / raw) To: linux-arm-kernel Hi Ben, Please see inline On 23 July 2013 11:30, Ben Dooks <ben.dooks@codethink.co.uk> wrote: > On 23/07/13 19:03, Victor Kamensky wrote: >> >> Hi Ben, >> >> Sorry for multiple post. I am new to this, and I could not get mailing >> list to accept my original email. >> For sake of others so people can see discussed patch, here it goes >> again with patch file inlined >> now. Hope it will this time >> >> Wrt BE8 little endian instructions you will need to fix couple more >> places: >> ftrace arch/arm/kernel/ftrace.c >> kprobes arch/arm/kernel/kprobes.c >> Also in big endian mode atomic64_xxx function will need fixes, otherwise >> perf >> counters will be truncated on max of 32bit values >> atomic64 arch/arm/include/asm/atomic.h >> >> I've attached board independent (core) patch from my work that made >> Pandaboard ES >> kernel run in BE mode. You can check my version of fixes for above issues >> in the >> patch. Look for corresponding files changes. Please rework them >> according to style >> and rules of your patch series. Note the patch itself contains fixes >> for few other >> issues that already fixed in your series. I'll cross check and compare >> individual >> areas latter. I think you may find attached patch interesting as well, >> it was developed >> independent from yours but matches its very well. >> >> Wrt of testing: ftrace was tested on Pandaboard ES, krprobes changes were >> tested >> with SystemTap at some point of patch version on Pandaboard ES (not >> thumb mode though), >> also it may deteriorated while ported across different kernel version, >> good idea to >> rested last patch. atomic64 were tested on Pandaboard ES and Marvell >> Armada XP. >> >> Thanks, >> Victor > > > First notes, please always format your emails (where possible) to > under 80 characters per line so that they do not get mangled when > viewed on limited screens (for example, I often read email via a > 24x80 terminal) Yes, I understand. Sorry about this. Please bear with me a bit. I am new to Linaro .. just my second week started. Still learning tools used here, and still struggling with default email client. > Second note, when producing patches, it is better to try and split > them into a series of patches to reach a goal than just one large > patch. It keeps a lot of useful information (see patch comments) > and it means that if there is a problem, then bits can be lifted > out to check. Yes, I understand. Indeed multiple patches is best from. But I did not really mean try to contribute these patches, but rather throw patch over the wall for reference. I believe largely your series covers the same things, in more accurate way, and should be continued. I plan to review your series and will try to integrate it and run on Pandaboard ES. Note for this testing I would use big patch of TI OMAP4 drivers to be endian neutral. It will take me several days. If I find anything I will let you know accross these days. Note my original experiment was done against TI 3.1 kernel and after that was carried forward without great care, so it may explain few descrepencies below and my blunder with ftrace comment. > > >> Index: linux-linaro-tracking/arch/arm/Makefile >> =================================================================== >> --- linux-linaro-tracking.orig/arch/arm/Makefile >> +++ linux-linaro-tracking/arch/arm/Makefile >> @@ -16,6 +16,7 @@ LDFLAGS := >> LDFLAGS_vmlinux :=-p --no-undefined -X >> ifeq ($(CONFIG_CPU_ENDIAN_BE8),y) >> LDFLAGS_vmlinux += --be8 >> +KBUILD_LDFLAGS_MODULE += --be8 >> endif >> >> OBJCOPYFLAGS :=-O binary -R .comment -S >> @@ -43,12 +44,19 @@ ifeq ($(CONFIG_CC_STACKPROTECTOR),y) >> KBUILD_CFLAGS +=-fstack-protector >> endif >> >> +# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are >> +# places where CFLAGS alone are used (cmd_record_mcount). And it has to >> +# be in CPPFLAGS because it affects what macros (__ARMEB__, >> __BYTE_ORDER__, >> +# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be >> +# good compromise >> ifeq ($(CONFIG_CPU_BIG_ENDIAN),y) >> KBUILD_CPPFLAGS += -mbig-endian >> +KBUILD_CFLAGS += -mbig-endian >> AS += -EB >> LD += -EB >> else >> KBUILD_CPPFLAGS += -mlittle-endian >> +KBUILD_CFLAGS += -mlittle-endian >> AS += -EL >> LD += -EL >> endif > > > See the comment below. > > >> Index: linux-linaro-tracking/arch/arm/boot/compressed/Makefile >> =================================================================== >> --- linux-linaro-tracking.orig/arch/arm/boot/compressed/Makefile >> +++ linux-linaro-tracking/arch/arm/boot/compressed/Makefile >> @@ -138,6 +138,24 @@ endif >> ifeq ($(CONFIG_CPU_ENDIAN_BE8),y) >> LDFLAGS_vmlinux += --be8 >> endif >> + >> +# Passing endian compilation flag in both CPPFLAGS ad CFLAGS. There are >> +# places where CFLAGS alone are used (cmd_record_mcount). And it has to >> +# be in CPPFLAGS because it affects what macros (__ARMEB__, >> __BYTE_ORDER__, >> +# __FLOAT_WORD_ORDER__) are defined. So replication of flag seems to be >> +# good compromise >> +ifeq ($(CONFIG_CPU_BIG_ENDIAN),y) >> +KBUILD_CPPFLAGS += -mbig-endian >> +KBUILD_CFLAGS += -mbig-endian >> +AS += -EB >> +LD += -EB >> +else >> +KBUILD_CPPFLAGS += -mlittle-endian >> +KBUILD_CFLAGS += -mlittle-endian >> +AS += -EL >> +LD += -EL >> +endif >> + >> # ? >> LDFLAGS_vmlinux += -p >> # Report unresolved symbol references > > > I've not been bitten by this one. Is there anyone else who can comment > on KBUILD_CFLAGS vs KBUILD_CPPFLAGS? > > I had assumed that arch/arm/boot/compressed/Makefile would have > inherited the flags from the arch/arm/Makefile. Someone else would > be better qualified to comment on this. Let's ignore this. I don't think it is relevant now. There was probably some odd situation in old kernel. > >> Index: linux-linaro-tracking/arch/arm/boot/compressed/head.S >> =================================================================== >> --- linux-linaro-tracking.orig/arch/arm/boot/compressed/head.S >> +++ linux-linaro-tracking/arch/arm/boot/compressed/head.S >> @@ -141,6 +141,12 @@ start: >> #endif >> mov r7, r1 @ save architecture ID >> mov r8, r2 @ save atags pointer >> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER >> + /* >> + * Switch to bigendian mode >> + */ >> + setend be >> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ >> >> #ifndef __ARM_ARCH_2__ >> /* >> Index: linux-linaro-tracking/arch/arm/include/asm/setup.h >> =================================================================== >> --- linux-linaro-tracking.orig/arch/arm/include/asm/setup.h >> +++ linux-linaro-tracking/arch/arm/include/asm/setup.h >> @@ -53,4 +53,8 @@ extern int arm_add_memory(phys_addr_t st >> extern void early_print(const char *str, ...); >> extern void dump_machine_table(void); >> >> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER >> +void byteswap_tags(struct tag *t); >> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ >> + >> #endif > > > You could have done > > #ifdef CONFIG_LITTLE_ENDIAN_LOADER > void byteswap_tags(struct tag *t); > #else > static inline void byteswap_tags(struct tag *t) { } > #endif /* CONFIG_LITTLE_ENDIAN_LOADER */ > > > >> Index: linux-linaro-tracking/arch/arm/kernel/Makefile >> =================================================================== >> --- linux-linaro-tracking.orig/arch/arm/kernel/Makefile >> +++ linux-linaro-tracking/arch/arm/kernel/Makefile >> @@ -22,6 +22,7 @@ obj-y := elf.o entry-armv.o entry-commo >> obj-$(CONFIG_ATAGS) += atags_parse.o >> obj-$(CONFIG_ATAGS_PROC) += atags_proc.o >> obj-$(CONFIG_DEPRECATED_PARAM_STRUCT) += atags_compat.o >> +obj-$(CONFIG_LITTLE_ENDIAN_LOADER) += atags_byteswap.o >> >> obj-$(CONFIG_OC_ETM) += etm.o >> obj-$(CONFIG_CPU_IDLE) += cpuidle.o >> Index: linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c >> =================================================================== >> --- /dev/null >> +++ linux-linaro-tracking/arch/arm/kernel/atags_byteswap.c >> @@ -0,0 +1,119 @@ >> +#include<linux/slab.h> >> +#include<linux/proc_fs.h> >> +#include<linux/swab.h> >> +#include<asm/setup.h> >> +#include<asm/types.h> >> +#include<asm/page.h> >> + >> +#define byteswap32(x) (x) = __swab32((x)) >> +#define byteswap16(x) (x) = __swab16((x)) >> + >> +static void __init byteswap_tag_core(struct tag *tag) >> +{ >> + byteswap32(tag->u.core.flags); >> + byteswap32(tag->u.core.pagesize); >> + byteswap32(tag->u.core.rootdev); >> +} >> + >> +static void __init byteswap_tag_mem32(struct tag *tag) >> +{ >> + byteswap32(tag->u.mem.size); >> + byteswap32(tag->u.mem.start); >> +} >> + >> +static void __init byteswap_tag_videotext(struct tag *tag) >> +{ >> + byteswap16(tag->u.videotext.video_page); >> + byteswap16(tag->u.videotext.video_ega_bx); >> + byteswap16(tag->u.videotext.video_points); >> +} >> + >> +static void __init byteswap_tag_ramdisk(struct tag *tag) >> +{ >> + byteswap32(tag->u.ramdisk.flags); >> + byteswap32(tag->u.ramdisk.size); >> + byteswap32(tag->u.ramdisk.start); >> +} >> + >> +static void __init byteswap_tag_initrd(struct tag *tag) >> +{ >> + byteswap32(tag->u.initrd.start); >> + byteswap32(tag->u.initrd.size); >> +} >> + >> +static void __init byteswap_tag_serialnr(struct tag *tag) >> +{ >> + byteswap32(tag->u.serialnr.low); >> + byteswap32(tag->u.serialnr.high); >> +} >> + >> +static void __init byteswap_tag_revision(struct tag *tag) >> +{ >> + byteswap32(tag->u.revision.rev); >> +} >> + >> +static void __init byteswap_tag_videolfb(struct tag *tag) >> +{ >> + byteswap16(tag->u.videolfb.lfb_width); >> + byteswap16(tag->u.videolfb.lfb_height); >> + byteswap16(tag->u.videolfb.lfb_depth); >> + byteswap16(tag->u.videolfb.lfb_linelength); >> + byteswap32(tag->u.videolfb.lfb_base); >> + byteswap32(tag->u.videolfb.lfb_size); >> +} >> + >> +static void __init byteswap_tag_acorn(struct tag *tag) >> +{ >> + byteswap32(tag->u.acorn.memc_control_reg); >> + byteswap32(tag->u.acorn.vram_pages); >> +} >> + >> +static void __init byteswap_tag_memclk(struct tag *tag) >> +{ >> + byteswap32(tag->u.memclk.fmemclk); >> +} >> + >> +void __init byteswap_tags(struct tag *t) >> +{ >> + for (; t->hdr.size; t = tag_next(t)) { >> + byteswap32(t->hdr.size); >> + byteswap32(t->hdr.tag); >> + switch(t->hdr.tag) { >> + case ATAG_CORE: >> + byteswap_tag_core(t); >> + break; >> + case ATAG_MEM: >> + byteswap_tag_mem32(t); >> + break; >> + case ATAG_VIDEOTEXT: >> + byteswap_tag_videotext(t); >> + break; >> + case ATAG_RAMDISK: >> + byteswap_tag_ramdisk(t); >> + break; >> + case ATAG_INITRD2: >> + byteswap_tag_initrd(t); >> + break; >> + case ATAG_SERIAL: >> + byteswap_tag_serialnr(t); >> + break; >> + case ATAG_REVISION: >> + byteswap_tag_revision(t); >> + break; >> + case ATAG_VIDEOLFB: >> + byteswap_tag_videolfb(t); >> + break; >> + case ATAG_CMDLINE: >> + break; >> + case ATAG_ACORN: >> + byteswap_tag_acorn(t); >> + break; >> + case ATAG_MEMCLK: >> + byteswap_tag_memclk(t); >> + break; >> + default: >> + /* Need to complain? */ >> + break; >> + } >> + } >> +} > > > I think swapping on read is a better idea, as it leaves > these in their original format. > > PS, looks like you either have a formatting issue with > your email client or your code. Did this pass checkpatch? > > >> Index: linux-linaro-tracking/arch/arm/kernel/head-common.S >> =================================================================== >> --- linux-linaro-tracking.orig/arch/arm/kernel/head-common.S >> +++ linux-linaro-tracking/arch/arm/kernel/head-common.S >> @@ -48,6 +48,9 @@ __vet_atags: >> bne 1f >> >> ldr r5, [r2, #0] >> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER >> + rev r5, r5 >> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ >> #ifdef CONFIG_OF_FLATTREE >> ldr r6, =OF_DT_MAGIC @ is it a DTB? >> cmp r5, r6 >> @@ -57,6 +60,9 @@ __vet_atags: >> cmpne r5, #ATAG_CORE_SIZE_EMPTY >> bne 1f >> ldr r5, [r2, #4] >> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER >> + rev r5, r5 >> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ >> ldr r6, =ATAG_CORE >> cmp r5, r6 >> bne 1f > > > bit-untidy. also, does OF_DT_MAGIC get changed by the > endian-ness, or is that somewhere else in the head code? > >> Index: linux-linaro-tracking/arch/arm/kernel/module.c >> =================================================================== >> --- linux-linaro-tracking.orig/arch/arm/kernel/module.c >> +++ linux-linaro-tracking/arch/arm/kernel/module.c >> @@ -45,6 +45,65 @@ void *module_alloc(unsigned long size) >> } >> #endif >> >> +/* >> + * For relocations in exectuable sections if we configured with >> + * CONFIG_CPU_ENDIAN_BE8 we assume that code is in little endian byteswap >> + * form already and we need to byteswap value when we read and when we >> + * write. If relocaton R_ARM_ABS32 type we assume that it is relocation >> for >> + * data piece and we do not perform byteswaps when such relocation >> applied. >> + * This herustic based approach may break in future. >> + * >> + * It seems that more correct way to handle be8 in kernel module is >> + * to build module without --be8 option, but perform instruction byteswap >> + * when module sections are loaded, after all relocation are applied. >> + * In order to do such byteswap we need to read all mappings symbols ($a, >> + * $d, $t), sort them, and apply byteswaps for $a and $t entries. >> + */ >> +static inline u32 read_section_u32(Elf32_Shdr *dstsec, unsigned long loc) >> +{ >> + u32 retval = *(u32 *)loc; >> + >> +#ifdef CONFIG_CPU_ENDIAN_BE8 >> + if (dstsec->sh_flags& SHF_EXECINSTR) { >> >> + retval = __swab32(retval); >> + } >> +#endif /* CONFIG_CPU_ENDIAN_BE8 */ >> + return retval; >> +} >> + >> +static inline void write_section_u32(Elf32_Shdr *dstsec, unsigned >> long loc, u32 value) >> +{ >> +#ifdef CONFIG_CPU_ENDIAN_BE8 >> + if (dstsec->sh_flags& SHF_EXECINSTR) { >> >> + value = __swab32(value); >> + } >> +#endif /* CONFIG_CPU_ENDIAN_BE8 */ >> + *(u32 *) loc = value; >> +} >> + >> +#ifdef CONFIG_THUMB2_KERNEL >> +static inline u16 read_section_u16(Elf32_Shdr *dstsec, unsigned long loc) >> +{ >> + u16 retval = *(u16 *) loc; >> +#ifdef CONFIG_CPU_ENDIAN_BE8 >> + if (dstsec->sh_flags& SHF_EXECINSTR) { >> >> + retval = __swab16(retval); >> + } >> +#endif /* CONFIG_CPU_ENDIAN_BE8 */ >> + return retval; >> +} >> + >> +static inline void write_section_u16(Elf32_Shdr *dstsec, unsigned >> long loc, u16 value) >> +{ >> +#ifdef CONFIG_CPU_ENDIAN_BE8 >> + if (dstsec->sh_flags& SHF_EXECINSTR) { >> >> + value = __swab16(value); >> + } >> +#endif /* CONFIG_CPU_ENDIAN_BE8 */ >> + *(u16 *) loc = value; >> +} >> +#endif /* CONFIG_THUMB2_KERNEL */ >> + >> int >> apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int >> symindex, >> unsigned int relindex, struct module *module) >> @@ -60,6 +119,7 @@ apply_relocate(Elf32_Shdr *sechdrs, cons >> Elf32_Sym *sym; >> const char *symname; >> s32 offset; >> + u32 loc_u32; >> #ifdef CONFIG_THUMB2_KERNEL >> u32 upper, lower, sign, j1, j2; >> #endif >> @@ -95,7 +155,8 @@ apply_relocate(Elf32_Shdr *sechdrs, cons >> case R_ARM_PC24: >> case R_ARM_CALL: >> case R_ARM_JUMP24: >> - offset = (*(u32 *)loc& 0x00ffffff)<< 2; >> >> + loc_u32 = read_section_u32(dstsec, loc); >> + offset = (loc_u32& 0x00ffffff)<< 2; >> if (offset& 0x02000000) >> >> offset -= 0x04000000; >> >> @@ -112,27 +173,33 @@ apply_relocate(Elf32_Shdr *sechdrs, cons >> >> offset>>= 2; >> >> - *(u32 *)loc&= 0xff000000; >> >> - *(u32 *)loc |= offset& 0x00ffffff; >> + loc_u32&= 0xff000000; >> + loc_u32 |= offset& 0x00ffffff; >> >> + write_section_u32(dstsec, loc, loc_u32); >> break; >> >> - case R_ARM_V4BX: >> + case R_ARM_V4BX: >> + loc_u32 = read_section_u32(dstsec, loc); >> /* Preserve Rm and the condition code. Alter >> * other bits to re-code instruction as >> * MOV PC,Rm. >> */ >> - *(u32 *)loc&= 0xf000000f; >> >> - *(u32 *)loc |= 0x01a0f000; >> + loc_u32&= 0xf000000f; >> >> + loc_u32 |= 0x01a0f000; >> + write_section_u32(dstsec, loc, loc_u32); >> break; >> >> case R_ARM_PREL31: >> - offset = *(u32 *)loc + sym->st_value - loc; >> - *(u32 *)loc = offset& 0x7fffffff; >> >> + loc_u32 = read_section_u32(dstsec, loc); >> + offset = loc_u32 + sym->st_value - loc; >> + loc_u32 = offset& 0x7fffffff; >> >> + write_section_u32(dstsec, loc, loc_u32); >> break; >> >> case R_ARM_MOVW_ABS_NC: >> case R_ARM_MOVT_ABS: >> - offset = *(u32 *)loc; >> + loc_u32 = read_section_u32(dstsec, loc); >> + offset = loc_u32; >> offset = ((offset& 0xf0000)>> 4) | (offset& 0xfff); >> >> offset = (offset ^ 0x8000) - 0x8000; >> >> @@ -140,16 +207,17 @@ apply_relocate(Elf32_Shdr *sechdrs, cons >> if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS) >> offset>>= 16; >> >> - *(u32 *)loc&= 0xfff0f000; >> >> - *(u32 *)loc |= ((offset& 0xf000)<< 4) | >> - (offset& 0x0fff); >> + loc_u32&= 0xfff0f000; >> + loc_u32 |= ((offset& 0xf000)<< 4) | >> + (offset& 0x0fff); >> >> + write_section_u32(dstsec, loc, loc_u32); >> break; >> >> #ifdef CONFIG_THUMB2_KERNEL >> case R_ARM_THM_CALL: >> case R_ARM_THM_JUMP24: >> - upper = *(u16 *)loc; >> - lower = *(u16 *)(loc + 2); >> + upper = read_section_u16(dstsec, loc); >> + lower = read_section_u16(dstsec, loc + 2); >> >> /* >> * 25 bit signed address range (Thumb-2 BL and B.W >> @@ -198,17 +266,19 @@ apply_relocate(Elf32_Shdr *sechdrs, cons >> sign = (offset>> 24)& 1; >> j1 = sign ^ (~(offset>> 23)& 1); >> j2 = sign ^ (~(offset>> 22)& 1); >> - *(u16 *)loc = (u16)((upper& 0xf800) | (sign<< 10) | >> - ((offset>> 12)& 0x03ff)); >> - *(u16 *)(loc + 2) = (u16)((lower& 0xd000) | >> >> - (j1<< 13) | (j2<< 11) | >> - ((offset>> 1)& 0x07ff)); >> + write_section_u16(dstsec, loc, >> + (u16)((upper& 0xf800) | >> (sign<< 10) | >> + ((offset>> 12)& >> 0x03ff))); >> >> + write_section_u16(dstsec, loc + 2, >> + (u16)((lower& 0xd000) | >> >> + (j1<< 13) | (j2<< 11) | >> + ((offset>> 1)& >> 0x07ff))); >> >> break; >> >> case R_ARM_THM_MOVW_ABS_NC: >> case R_ARM_THM_MOVT_ABS: >> - upper = *(u16 *)loc; >> - lower = *(u16 *)(loc + 2); >> + upper = read_section_u16(dstsec, loc); >> + lower = read_section_u16(dstsec, loc + 2); >> >> /* >> * MOVT/MOVW instructions encoding in Thumb-2: >> @@ -229,12 +299,14 @@ apply_relocate(Elf32_Shdr *sechdrs, cons >> if (ELF32_R_TYPE(rel->r_info) == R_ARM_THM_MOVT_ABS) >> offset>>= 16; >> >> - *(u16 *)loc = (u16)((upper& 0xfbf0) | >> - ((offset& 0xf000)>> 12) | >> - ((offset& 0x0800)>> 1)); >> >> - *(u16 *)(loc + 2) = (u16)((lower& 0x8f00) | >> - ((offset& 0x0700)<< 4) | >> - (offset& 0x00ff)); >> >> + write_section_u16(dstsec, loc, >> + (u16)((upper& 0xfbf0) | >> + ((offset& 0xf000)>> 12) >> | >> + ((offset& 0x0800)>> >> 1))); >> >> + write_section_u16(dstsec, loc + 2, >> + (u16)((lower& 0x8f00) | >> + ((offset& 0x0700)<< 4) >> | >> + (offset& 0x00ff))); >> break; >> #endif >> > > I'm still not sure about this, I think the use of <asm/opcodes.h> > is better here. Not sure if there's much point in checking the flags > on the sections as I think the relocations pretty much define which > endian they will be in. Wrt asm/opcodes.h, yes, agree it should be used now. I'll need to check why I used section attributes to guide byteswaps. I think it was done for a reason. We tested it with bigger system on top with quite a bit of different KMLs, I think I run into problem with some of them. Will need dig back. As my comment says ideally it would be better to use $a, $d, $t, symbols; that is what ld does with --be8 option, but it maybe too hard as starting point. >> Index: linux-linaro-tracking/arch/arm/mm/Kconfig >> =================================================================== >> --- linux-linaro-tracking.orig/arch/arm/mm/Kconfig >> +++ linux-linaro-tracking/arch/arm/mm/Kconfig >> @@ -695,6 +695,15 @@ config CPU_ENDIAN_BE32 >> help >> Support for the BE-32 (big-endian) mode on pre-ARMv6 processors. >> >> +config LITTLE_ENDIAN_LOADER >> + bool >> + depends on CPU_BIG_ENDIAN >> + default y >> + help >> + If Linux should operate in big-endian mode, but bootloader >> (i.e u-boot) >> + is still in little endian mode and passes boot information in >> little >> + endian form set this flag >> + >> config CPU_HIGH_VECTOR >> depends on !MMU&& CPU_CP15&& !CPU_ARM740T >> >> bool "Select the High exception vector" > > > >> Index: linux-linaro-tracking/arch/arm/mm/alignment.c >> =================================================================== >> --- linux-linaro-tracking.orig/arch/arm/mm/alignment.c >> +++ linux-linaro-tracking/arch/arm/mm/alignment.c >> @@ -792,6 +792,10 @@ do_alignment(unsigned long addr, unsigne >> >> regs->ARM_pc += isize; >> >> +#ifdef CONFIG_CPU_ENDIAN_BE8 >> + instr = __swab32(instr); /* little endian instruction */ >> +#endif /* CONFIG_CPU_ENDIAN_BE8 */ >> + >> switch (CODING_BITS(instr)) { >> case 0x00000000: /* 3.13.4 load/store instruction extensions */ >> if (LDSTHD_I_BIT(instr)) > > > See comments on <asm/opcodes.h> > > >> Index: linux-linaro-tracking/arch/arm/kernel/head.S >> =================================================================== >> --- linux-linaro-tracking.orig/arch/arm/kernel/head.S >> +++ linux-linaro-tracking/arch/arm/kernel/head.S >> @@ -89,6 +89,12 @@ ENTRY(stext) >> @ ensure svc mode and all interrupts masked >> safe_svcmode_maskall r9 >> >> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER >> + /* >> + * Switch to bigendian mode >> + */ >> + setend be >> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ >> mrc p15, 0, r9, c0, c0 @ get processor id >> bl __lookup_processor_type @ r5=procinfo r9=cpuid >> movs r10, r5 @ invalid processor (r5=0)? >> @@ -356,6 +362,12 @@ ENTRY(secondary_startup) >> #endif >> safe_svcmode_maskall r9 >> >> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER >> + /* >> + * Switch to bigendian mode >> + */ >> + setend be >> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ >> mrc p15, 0, r9, c0, c0 @ get processor id >> bl __lookup_processor_type >> movs r10, r5 @ invalid processor? >> @@ -427,6 +439,9 @@ __enable_mmu: >> #ifdef CONFIG_CPU_ICACHE_DISABLE >> bic r0, r0, #CR_I >> #endif >> +#ifdef CONFIG_CPU_ENDIAN_BE8 >> + orr r0, r0, #CR_EE @ big-endian page tables >> +#endif >> #ifdef CONFIG_ARM_LPAE >> mov r5, #0 >> mcrr p15, 0, r4, r5, c2 @ load TTBR0 >> @@ -592,8 +607,14 @@ __fixup_a_pv_table: >> b 2f >> 1: add r7, r3 >> ldrh ip, [r7, #2] >> +#ifdef __ARMEB__ >> + rev ip, ip @ byteswap instruction >> +#endif /* __ARMEB__ */ >> and ip, 0x8f00 >> orr ip, r6 @ mask in offset bits 31-24 >> +#ifdef __ARMEB__ >> + rev ip, ip @ byteswap instruction >> +#endif /* __ARMEB__ */ >> strh ip, [r7, #2] >> 2: cmp r4, r5 >> ldrcc r7, [r4], #4 @ use branch for delay slot >> @@ -602,8 +623,14 @@ __fixup_a_pv_table: >> #else >> b 2f >> 1: ldr ip, [r7, r3] >> +#ifdef __ARMEB__ >> + rev ip, ip @ byteswap instruction >> +#endif /* __ARMEB__ */ >> bic ip, ip, #0x000000ff >> orr ip, ip, r6 @ mask in offset bits 31-24 >> +#ifdef __ARMEB__ >> + rev ip, ip @ byteswap instruction >> +#endif /* __ARMEB__ */ >> str ip, [r7, r3] >> 2: cmp r4, r5 >> ldrcc r7, [r4], #4 @ use branch for delay slot > > > Yeah, prefer my macros for doing this. > > >> Index: linux-linaro-tracking/arch/arm/kernel/kprobes.c >> =================================================================== >> --- linux-linaro-tracking.orig/arch/arm/kernel/kprobes.c >> +++ linux-linaro-tracking/arch/arm/kernel/kprobes.c >> @@ -66,14 +66,24 @@ int __kprobes arch_prepare_kprobe(struct >> if (is_wide_instruction(insn)) { >> insn<<= 16; >> insn |= ((u16 *)addr)[1]; >> +#ifdef CONFIG_CPU_ENDIAN_BE8 >> + insn = __swab32(insn); /* little endian instruction */ >> +#endif >> decode_insn = thumb32_kprobe_decode_insn; >> - } else >> - decode_insn = thumb16_kprobe_decode_insn; >> + } else { >> + decode_insn = thumb16_kprobe_decode_insn; >> +#ifdef CONFIG_CPU_ENDIAN_BE8 >> + insn = __swab16(insn); /* little endian instruction */ >> +#endif >> + } >> #else /* !CONFIG_THUMB2_KERNEL */ >> thumb = false; >> if (addr& 0x3) >> >> return -EINVAL; >> insn = *p->addr; >> +#ifdef CONFIG_CPU_ENDIAN_BE8 >> + insn = __swab32(insn); /* little endian instruction */ >> +#endif >> decode_insn = arm_kprobe_decode_insn; >> #endif >> >> @@ -89,7 +99,11 @@ int __kprobes arch_prepare_kprobe(struct >> if (!p->ainsn.insn) >> return -ENOMEM; >> for (is = 0; is< MAX_INSN_SIZE; ++is) >> +#ifndef CONFIG_CPU_ENDIAN_BE8 >> p->ainsn.insn[is] = tmp_insn[is]; >> +#else >> + p->ainsn.insn[is] = __swab32(tmp_insn[is]); >> /* little endian instruction */ >> +#endif >> flush_insns(p->ainsn.insn, >> sizeof(p->ainsn.insn[0]) * MAX_INSN_SIZE); >> p->ainsn.insn_fn = (kprobe_insn_fn_t *) >> @@ -113,10 +127,17 @@ void __kprobes arch_arm_kprobe(struct kp >> /* Remove any Thumb flag */ >> addr = (void *)((uintptr_t)p->addr& ~1); >> >> >> - if (is_wide_instruction(p->opcode)) >> + if (is_wide_instruction(p->opcode)) { >> brkp = KPROBE_THUMB32_BREAKPOINT_INSTRUCTION; >> - else >> +#ifdef CONFIG_CPU_ENDIAN_BE8 >> + brkp = __swab32(brkp); >> +#endif /* CONFIG_CPU_ENDIAN_BE8 */ >> + } else { >> brkp = KPROBE_THUMB16_BREAKPOINT_INSTRUCTION; >> +#ifdef CONFIG_CPU_ENDIAN_BE8 >> + brkp = __swab16(brkp); >> +#endif /* CONFIG_CPU_ENDIAN_BE8 */ >> + } >> } else { >> kprobe_opcode_t insn = p->opcode; >> >> @@ -127,6 +148,10 @@ void __kprobes arch_arm_kprobe(struct kp >> brkp |= 0xe0000000; /* Unconditional instruction */ >> else >> brkp |= insn& 0xf0000000; /* Copy condition from insn */ >> >> +#ifdef CONFIG_CPU_ENDIAN_BE8 >> + brkp = __swab32(brkp); >> +#endif /* CONFIG_CPU_ENDIAN_BE8 */ >> + >> } >> >> patch_text(addr, brkp); >> @@ -144,8 +169,12 @@ int __kprobes __arch_disarm_kprobe(void >> { >> struct kprobe *kp = p; >> void *addr = (void *)((uintptr_t)kp->addr& ~1); >> >> + kprobe_opcode_t insn = kp->opcode; >> >> - __patch_text(addr, kp->opcode); >> +#ifdef CONFIG_CPU_ENDIAN_BE8 >> + insn = __swab32(insn); /* little endian instruction */ >> +#endif >> + __patch_text(addr, insn); >> >> return 0; >> > > __patch_text already does swapping, so a little concerned that it > is being done twice in some places. > > > >> Index: linux-linaro-tracking/arch/arm/kernel/traps.c >> =================================================================== >> --- linux-linaro-tracking.orig/arch/arm/kernel/traps.c >> +++ linux-linaro-tracking/arch/arm/kernel/traps.c >> @@ -426,6 +426,10 @@ asmlinkage void __exception do_undefinst >> goto die_sig; >> } >> >> +#ifdef CONFIG_CPU_ENDIAN_BE8 >> + instr = __swab32(instr); /* little endian instruction */ >> +#endif /* CONFIG_CPU_ENDIAN_BE8 */ >> + >> if (call_undef_hook(regs, instr) == 0) >> return; > > > already fixed. > > >> Index: linux-linaro-tracking/arch/arm/kernel/ftrace.c >> =================================================================== >> --- linux-linaro-tracking.orig/arch/arm/kernel/ftrace.c >> +++ linux-linaro-tracking/arch/arm/kernel/ftrace.c >> @@ -100,10 +100,18 @@ static int ftrace_modify_code(unsigned l >> if (probe_kernel_read(&replaced, (void *)pc, MCOUNT_INSN_SIZE)) >> return -EFAULT; >> >> +#ifdef CONFIG_CPU_ENDIAN_BE8 >> + replaced = __swab32(replaced); /* little endian instruction */ >> +#endif /* CONFIG_CPU_ENDIAN_BE8 */ >> + >> if (replaced != old) >> return -EINVAL; >> } >> >> +#ifdef CONFIG_CPU_ENDIAN_BE8 >> + new = __swab32(new); /* little endian instruction */ >> +#endif /* CONFIG_CPU_ENDIAN_BE8 */ >> + >> if (probe_kernel_write((void *)pc,&new, MCOUNT_INSN_SIZE)) >> return -EPERM; > > > Please see <asm/opcodes.h> for dealing with the op-codes like this. Yes, agree. It is not needed any more. In kernel 3.1/3.2 there was no asm/opcodes.h, __opcode_to_mem_arm, etc macros. So that was added. It is incorrect now. > >> Index: linux-linaro-tracking/arch/arm/kernel/atags_parse.c >> =================================================================== >> --- linux-linaro-tracking.orig/arch/arm/kernel/atags_parse.c >> +++ linux-linaro-tracking/arch/arm/kernel/atags_parse.c >> @@ -216,6 +216,10 @@ struct machine_desc * __init setup_machi >> if (tags->hdr.tag != ATAG_CORE) >> convert_to_tag_list(tags); >> #endif >> +#ifdef CONFIG_LITTLE_ENDIAN_LOADER >> + byteswap_tags(tags); >> +#endif /* CONFIG_LITTLE_ENDIAN_LOADER */ >> + >> if (tags->hdr.tag != ATAG_CORE) { >> early_print("Warning: Neither atags nor dtb found\n"); >> tags = (struct tag *)&default_tags; >> Index: linux-linaro-tracking/arch/arm/kernel/sleep.S > > > I really don't like this way, it makes it difficult to deal with > if you export this data to user-space to then do things like md5sum > it. Also, if you're going to kexec() a new kernel, would you have > to re-swap it back before this? Agree that swap in place could be problematic. I will look at this, starting with patches you've done. Will get back to you. >> Index: linux-linaro-tracking/arch/arm/include/asm/atomic.h >> =================================================================== >> --- linux-linaro-tracking.orig/arch/arm/include/asm/atomic.h >> +++ linux-linaro-tracking/arch/arm/include/asm/atomic.h >> @@ -301,8 +301,13 @@ static inline void atomic64_add(u64 i, a >> >> __asm__ __volatile__("@ atomic64_add\n" >> "1: ldrexd %0, %H0, [%3]\n" >> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >> " adds %0, %0, %4\n" >> " adc %H0, %H0, %H4\n" >> +#else >> +" adds %H0, %H0, %H4\n" >> +" adc %0, %0, %4\n" >> +#endif >> " strexd %1, %0, %H0, [%3]\n" >> " teq %1, #0\n" >> " bne 1b" >> @@ -320,8 +325,13 @@ static inline u64 atomic64_add_return(u6 >> >> __asm__ __volatile__("@ atomic64_add_return\n" >> "1: ldrexd %0, %H0, [%3]\n" >> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >> " adds %0, %0, %4\n" >> " adc %H0, %H0, %H4\n" >> +#else >> +" adds %H0, %H0, %H4\n" >> +" adc %0, %0, %4\n" >> +#endif >> " strexd %1, %0, %H0, [%3]\n" >> " teq %1, #0\n" >> " bne 1b" >> @@ -341,8 +351,13 @@ static inline void atomic64_sub(u64 i, a >> >> __asm__ __volatile__("@ atomic64_sub\n" >> "1: ldrexd %0, %H0, [%3]\n" >> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >> " subs %0, %0, %4\n" >> " sbc %H0, %H0, %H4\n" >> +#else >> +" subs %H0, %H0, %H4\n" >> +" sbc %0, %0, %4\n" >> +#endif >> " strexd %1, %0, %H0, [%3]\n" >> " teq %1, #0\n" >> " bne 1b" >> @@ -360,8 +375,13 @@ static inline u64 atomic64_sub_return(u6 >> >> __asm__ __volatile__("@ atomic64_sub_return\n" >> "1: ldrexd %0, %H0, [%3]\n" >> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >> " subs %0, %0, %4\n" >> " sbc %H0, %H0, %H4\n" >> +#else >> +" subs %H0, %H0, %H4\n" >> +" sbc %0, %0, %4\n" >> +#endif >> " strexd %1, %0, %H0, [%3]\n" >> " teq %1, #0\n" >> " bne 1b" >> @@ -428,9 +448,15 @@ static inline u64 atomic64_dec_if_positi >> >> __asm__ __volatile__("@ atomic64_dec_if_positive\n" >> "1: ldrexd %0, %H0, [%3]\n" >> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >> " subs %0, %0, #1\n" >> " sbc %H0, %H0, #0\n" >> " teq %H0, #0\n" >> +#else >> +" subs %H0, %H0, #1\n" >> +" sbc %0, %0, #0\n" >> +" teq %0, #0\n" >> +#endif >> " bmi 2f\n" >> " strexd %1, %0, %H0, [%3]\n" >> " teq %1, #0\n" >> @@ -459,8 +485,13 @@ static inline int atomic64_add_unless(at >> " teqeq %H0, %H5\n" >> " moveq %1, #0\n" >> " beq 2f\n" >> +#ifndef CONFIG_CPU_BIG_ENDIAN /* little endian */ >> " adds %0, %0, %6\n" >> " adc %H0, %H0, %H6\n" >> +#else >> +" adds %H0, %H0, %H6\n" >> +" adc %0, %0, %6\n" >> +#endif >> " strexd %2, %0, %H0, [%4]\n" >> " teq %2, #0\n" >> " bne 1b\n" > > > I still believe that you're doing the wrong thing here > and that these can be left well enough alone. Please give > a concrete piece of code that can show they're actually > doing the wrong thing. That one is addressed in separate email. Thanks, Victor > > > > -- > Ben Dooks http://www.codethink.co.uk/ > Senior Engineer Codethink - Providing Genius ^ permalink raw reply [flat|nested] 13+ messages in thread
* updates for be8 patch series 2013-07-24 7:31 ` Victor Kamensky @ 2013-07-24 21:24 ` Victor Kamensky 0 siblings, 0 replies; 13+ messages in thread From: Victor Kamensky @ 2013-07-24 21:24 UTC (permalink / raw) To: linux-arm-kernel Hi Ben, > I believe largely your series covers the same things, in more accurate way, > and should be continued. I plan to review your series and will try to > integrate it and run on Pandaboard ES. Note for this testing I would use big > patch of TI OMAP4 drivers to be endian neutral. It will take me several days. > If I find anything I will let you know accross these days. Here is what I got so far: setend be under ARM_BE8 or CPU_ENDIAN_BE8 usage ----------------------------------------------- In many places I see 'setend be' instruction under ARM_BE8 ARM_BE8( setend be ) and ARM_BE8 defined as +/* Select code for any configuration running in BE8 mode */ +#ifdef CONFIG_CPU_ENDIAN_BE8 +#define ARM_BE8(code...) code +#else +#define ARM_BE8(code...) +#endif but strictly speaking you need to do that only if CPU_BE8_BOOT_LE config set. I.e if loader is not in little endian mode, you do not need those commands. CPU will be already in BE mode. IMHO ARM_BE8 should be used only if it is really be8 specific issue. There could be just big endian issues, those should be done under CPU_BIG_ENDIAN config. There could be ARM CPU_BIG_ENDIAN system, but not CPU_ENDIAN_BE8. I.e ARCH_IXP4xx is example of that. For example 'ARM: pl01x debug code endian fix' patch IMHO should use CPU_BIG_ENDIAN rather than CONFIG_CPU_ENDIAN_BE8 (as it is now through ARM_BE8). I.e if the same serial console output function will run on CPU_BIG_ENDIAN but not CONFIG_CPU_ENDIAN_BE8 it would need such byteswap. I understand that it is a bit contrived example where code is really BSP specific and this BSP would never have CONFIG_CPU_ENDIAN_BE8 missing, but I am concerned that meaning of configs got blured if it is used like this. Note you do have BE8_LE defined in latter patch ... Few places miss atags swaps on the read --------------------------------------- I've tried to compare your atags reading changes vs my byteswap in place code. I agree that byte swap on reading is better. It seems to me that you've missed few places when conversion should happen (unless I am missing something). Also I think you need to add atag16_to_cpu, and cpu_to_atag16 so short atag values could be handled. 1) arch/arm/kernel/atags_parse.c <global> 87 __tagtable(ATAG_VIDEOTEXT, parse_tag_videotext); It needs to swap (only relevant fields listed) struct tag_videotext { __u16 video_page; __u16 video_ega_bx; __u16 video_points; }; 2) arch/arm/kernel/atags_parse.c <global> 140 __tagtable(ATAG_CMDLINE, parse_tag_cmdline); It needs to swap cmdline, as far as I read the code it is a pointer ... wondering what is going on in 64bit case ... struct tag_cmdline { char cmdline[1]; /* this is the minimum size */ }; 3) arch/arm/mach-footbridge/common.c <global> 51 __tagtable(ATAG_MEMCLK, parse_tag_memclk); For consistency sake it is good idea to convert this too. It needs to swap fmemclk on the read struct tag_memclk { __u32 fmemclk; }; 4) arch/arm/mach-rpc/riscpc.c <global> 65 __tagtable(ATAG_ACORN, parse_tag_acorn); For consistency sake it is good idea to convert this too. It needs to swap when reading the following fields struct tag_acorn { __u32 memc_control_reg; __u32 vram_pages; }; 5) arch/arm/mm/init.c <global> 68 __tagtable(ATAG_INITRD, parse_tag_initrd); 6) arch/arm/mm/init.c <global> 77 __tagtable(ATAG_INITRD2, parse_tag_initrd2); Need to swap struct tag_initrd { __u32 start; /* physical start address */ __u32 size; /* size of compressed ramdisk image in bytes */ }; Other than these notes, it looks very very good! Thanks, Victor ^ permalink raw reply [flat|nested] 13+ messages in thread
* updates for be8 patch series @ 2013-07-22 16:33 Ben Dooks 2013-07-22 16:49 ` Ben Dooks 0 siblings, 1 reply; 13+ messages in thread From: Ben Dooks @ 2013-07-22 16:33 UTC (permalink / raw) To: linux-arm-kernel I've updated the BE8 patch series after running further tests on the kernel. I found an issue with the alignment handler (and I believe the same issue with the trap handler). I also ran tests with the module loader, which shows that if we are in BE8 that it needs to fix up the relocations. Also the modules need to be built --be8 otherwise they cannot be easily relocated when loaded. Should patches 3 and 4 be together (ie, alter the LDFLAGS and the module.c) and have I got the correct LDLAGS? Comments and testing welcome. I will push out a new series later with these in and possibly rebased on 3.11-rc2. ^ permalink raw reply [flat|nested] 13+ messages in thread
* updates for be8 patch series 2013-07-22 16:33 Ben Dooks @ 2013-07-22 16:49 ` Ben Dooks 0 siblings, 0 replies; 13+ messages in thread From: Ben Dooks @ 2013-07-22 16:49 UTC (permalink / raw) To: linux-arm-kernel On 22/07/13 17:33, Ben Dooks wrote: > I've updated the BE8 patch series after running further tests on the > kernel. I found an issue with the alignment handler (and I believe the > same issue with the trap handler). > > I also ran tests with the module loader, which shows that if we are > in BE8 that it needs to fix up the relocations. Also the modules need > to be built --be8 otherwise they cannot be easily relocated when loaded. > > Should patches 3 and 4 be together (ie, alter the LDFLAGS and the > module.c) and have I got the correct LDLAGS? > > Comments and testing welcome. I will push out a new series later with > these in and possibly rebased on 3.11-rc2. New series based on 3.10 at: git://git.baserock.org/delta/linux.git baserock/311/be/core-v3 baserock/311/be/atags-v3 -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-07-24 21:24 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CAA3XUr1o0cOCybSR=pCUT-WYP4xus5VrES6Hyzo-Wy4tcutTvQ@mail.gmail.com> 2013-07-23 18:02 ` updates for be8 patch series Ben Dooks [not found] <CAA3XUr3h5C0oJVrN5iXsM=NFkLMz_Tssp28CysZfwMNS5KvO-Q@mail.gmail.com> 2013-07-23 17:40 ` Ben Dooks 2013-07-23 17:53 ` Victor Kamensky 2013-07-23 18:02 ` Ben Dooks 2013-07-23 18:03 ` Victor Kamensky 2013-07-23 18:30 ` Ben Dooks 2013-07-24 1:06 ` Victor Kamensky 2013-07-24 9:36 ` Will Deacon 2013-07-24 10:46 ` Ben Dooks 2013-07-24 7:31 ` Victor Kamensky 2013-07-24 21:24 ` Victor Kamensky 2013-07-22 16:33 Ben Dooks 2013-07-22 16:49 ` Ben Dooks
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).