* [RFC PATCH] arm: decompressor: initialize PIC offset base register for uClinux tools
@ 2013-01-29 17:29 Jonathan Austin
2013-01-29 20:13 ` Nicolas Pitre
0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Austin @ 2013-01-29 17:29 UTC (permalink / raw)
To: linux-arm-kernel
Before jumping to (position independent) C-code from the decompressor's
assembler world we set-up the C environment. This setup currently does not
set r9, which for arm-none-uclinux-uclibceabi should be the PIC offset base
register (IE should point to the beginning of the GOT).
Currently, therefore, in order to build working kernels that use the
decompressor it is necessary to use an arm-linux-gnueabi toolchain, or
similar. uClinux toolchains cause a Prefetch Abort to occur at the beginning
of the decompress_kernel function.
This patch allows uClinux toolchains to build bootable zImages by setting r9
to the beginning of the GOT when __uClinux__ is defined, allowing the
decompressor's C functions to work correctly.
Signed-off-by: Jonathan Austin <jonathan.austin@arm.com>
---
One other possibility would be to specify -mno-single-pic-base when building
the decompressor. This works around the problem, but forces the compiler to
generate less optimal code.
arch/arm/boot/compressed/head.S | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index fe4d9c3..4491e75 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -410,6 +410,10 @@ wont_overwrite:
* sp = stack pointer
*/
orrs r1, r0, r5
+#ifdef __uClinux__
+ mov r9, r11 @ PIC offset base register
+ addne r9, r9, r0 @ Also needs relocating
+#endif
beq not_relocated
add r11, r11, r0
--
1.7.9.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC PATCH] arm: decompressor: initialize PIC offset base register for uClinux tools
2013-01-29 17:29 [RFC PATCH] arm: decompressor: initialize PIC offset base register for uClinux tools Jonathan Austin
@ 2013-01-29 20:13 ` Nicolas Pitre
2013-02-01 16:43 ` Jonathan Austin
0 siblings, 1 reply; 9+ messages in thread
From: Nicolas Pitre @ 2013-01-29 20:13 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, 29 Jan 2013, Jonathan Austin wrote:
> Before jumping to (position independent) C-code from the decompressor's
> assembler world we set-up the C environment. This setup currently does not
> set r9, which for arm-none-uclinux-uclibceabi should be the PIC offset base
> register (IE should point to the beginning of the GOT).
>
> Currently, therefore, in order to build working kernels that use the
> decompressor it is necessary to use an arm-linux-gnueabi toolchain, or
> similar. uClinux toolchains cause a Prefetch Abort to occur at the beginning
> of the decompress_kernel function.
>
> This patch allows uClinux toolchains to build bootable zImages by setting r9
> to the beginning of the GOT when __uClinux__ is defined, allowing the
> decompressor's C functions to work correctly.
>
> Signed-off-by: Jonathan Austin <jonathan.austin@arm.com>
> ---
>
> One other possibility would be to specify -mno-single-pic-base when building
> the decompressor. This works around the problem, but forces the compiler to
> generate less optimal code.
How "less optimal"? How much bigger/slower is it?
If not significant enough then going with -mno-single-pic-base might be
fine.
> arch/arm/boot/compressed/head.S | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> index fe4d9c3..4491e75 100644
> --- a/arch/arm/boot/compressed/head.S
> +++ b/arch/arm/boot/compressed/head.S
> @@ -410,6 +410,10 @@ wont_overwrite:
> * sp = stack pointer
> */
> orrs r1, r0, r5
> +#ifdef __uClinux__
> + mov r9, r11 @ PIC offset base register
> + addne r9, r9, r0 @ Also needs relocating
> +#endif
> beq not_relocated
Please don't insert your code between the orrs and the beq as those two
go logically together.
In fact, the best location for this would probably be between the
wont_overwrite label and the comment that immediately follows it. And
then, those comments that follow until the branch into C code should be
updated accordingly.
Nicolas
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH] arm: decompressor: initialize PIC offset base register for uClinux tools
2013-01-29 20:13 ` Nicolas Pitre
@ 2013-02-01 16:43 ` Jonathan Austin
2013-02-01 18:07 ` Nicolas Pitre
2013-02-01 18:18 ` Russell King - ARM Linux
0 siblings, 2 replies; 9+ messages in thread
From: Jonathan Austin @ 2013-02-01 16:43 UTC (permalink / raw)
To: linux-arm-kernel
Hi Nicolas, thanks for the comments,
On 29/01/13 20:13, Nicolas Pitre wrote:
> On Tue, 29 Jan 2013, Jonathan Austin wrote:
>
>> Before jumping to (position independent) C-code from the decompressor's
>> assembler world we set-up the C environment. This setup currently does not
>> set r9, which for arm-none-uclinux-uclibceabi should be the PIC offset base
>> register (IE should point to the beginning of the GOT).
>>
>> Currently, therefore, in order to build working kernels that use the
>> decompressor it is necessary to use an arm-linux-gnueabi toolchain, or
>> similar. uClinux toolchains cause a Prefetch Abort to occur at the beginning
>> of the decompress_kernel function.
>>
>> This patch allows uClinux toolchains to build bootable zImages by setting r9
>> to the beginning of the GOT when __uClinux__ is defined, allowing the
>> decompressor's C functions to work correctly.
>>
>> Signed-off-by: Jonathan Austin <jonathan.austin@arm.com>
>> ---
>>
>> One other possibility would be to specify -mno-single-pic-base when building
>> the decompressor. This works around the problem, but forces the compiler to
>> generate less optimal code.
>
> How "less optimal"? How much bigger/slower is it?
> If not significant enough then going with -mno-single-pic-base might be
> fine.
Code that needs to access anything global will need to derive the location
of the GOT for itself, but there's a possible upside there that there's an
extra free register (r9 can be used as a general purpose register...)
The patch would look like:
-----8<-------
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 5cad8a6..afed28e 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -120,7 +120,7 @@ ORIG_CFLAGS := $(KBUILD_CFLAGS)
KBUILD_CFLAGS = $(subst -pg, , $(ORIG_CFLAGS))
endif
-ccflags-y := -fpic -fno-builtin -I$(obj)
+ccflags-y := -fpic -mno-single-pic-base -fno-builtin -I$(obj)
asflags-y := -Wa,-march=all -DZIMAGE
# Supply kernel BSS size to the decompressor via a linker symbol.
------>8---------
I did a fairly crude benchmark - count how many instructions we need in
order to finish decompressing the kernel...
Setup r9 correctly: 129,976,282
Use -mno-single-pic-base: 124,826,778
(this was done using an R-class model and a magic semi-hosting call to pause
the model at the end of the decompress_kernel function)
So, it seems like the extra register means there's actually a 4% *win*
in instruction terms from using -mno-single-pic-base
That said, I've still made some comments/amendments below...
>
>> arch/arm/boot/compressed/head.S | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
>> index fe4d9c3..4491e75 100644
>> --- a/arch/arm/boot/compressed/head.S
>> +++ b/arch/arm/boot/compressed/head.S
>> @@ -410,6 +410,10 @@ wont_overwrite:
>> * sp = stack pointer
>> */
>> orrs r1, r0, r5
>> +#ifdef __uClinux__
>> + mov r9, r11 @ PIC offset base register
>> + addne r9, r9, r0 @ Also needs relocating
>> +#endif
>> beq not_relocated
>
> Please don't insert your code between the orrs and the beq as those two
> go logically together.
I'd initially done this in order to change only one site - as we need to
set r9 and then add the offset I was using the condition code to test r0...
However, this was silly - I think I can just do it in one instruction:
add r9, r11, r0
In the case that we're not relocated, r0 should be 0 anyway...
>
> In fact, the best location for this would probably be between the
> wont_overwrite label and the comment that immediately follows it. And
> then, those comments that follow until the branch into C code should be
> updated accordingly.
Okay, assuming I've understood you correctly, you're suggesting something
like this:
-----8<-------
diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index fe4d9c3..d81efbd 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -396,6 +396,9 @@ dtb_check_done:
mov pc, r0
wont_overwrite:
+#ifdef __uClinux__
+ add r9, r11, r0 @ uClinux PIC offset base register
+#endif
/*
* If delta is zero, we are running@the address we were linked at.
* r0 = delta
@@ -405,6 +408,7 @@ wont_overwrite:
* r5 = appended dtb size (0 if not present)
* r7 = architecture ID
* r8 = atags pointer
+ * r9 = GOT start (for uClinux ABI), relocated
* r11 = GOT start
* r12 = GOT end
* sp = stack pointer
@@ -470,6 +474,7 @@ not_relocated: mov r0, #0
* r4 = kernel execution address
* r7 = architecture ID
* r8 = atags pointer
+ * r9 = GOT start (for uClinux ABI)
*/
mov r0, r4
mov r1, sp @ malloc space above stack
------->8-----------
The question that now occurs is whether we should just set r9 whether or not
we're using a uClinux toolchain - I don't think it is going to hurt as the
arm-linux-gnueabi world can happily clobber it with no bad consequences...
But after all this, it seems that just using -mno-single-pic base as in the patch
above is best...
Thoughts?
Jonny
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC PATCH] arm: decompressor: initialize PIC offset base register for uClinux tools
2013-02-01 16:43 ` Jonathan Austin
@ 2013-02-01 18:07 ` Nicolas Pitre
2013-02-01 18:18 ` Russell King - ARM Linux
1 sibling, 0 replies; 9+ messages in thread
From: Nicolas Pitre @ 2013-02-01 18:07 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, 1 Feb 2013, Jonathan Austin wrote:
> Hi Nicolas, thanks for the comments,
>
> On 29/01/13 20:13, Nicolas Pitre wrote:
> > On Tue, 29 Jan 2013, Jonathan Austin wrote:
> >
> >> Before jumping to (position independent) C-code from the decompressor's
> >> assembler world we set-up the C environment. This setup currently does not
> >> set r9, which for arm-none-uclinux-uclibceabi should be the PIC offset base
> >> register (IE should point to the beginning of the GOT).
> >>
> >> Currently, therefore, in order to build working kernels that use the
> >> decompressor it is necessary to use an arm-linux-gnueabi toolchain, or
> >> similar. uClinux toolchains cause a Prefetch Abort to occur at the beginning
> >> of the decompress_kernel function.
> >>
> >> This patch allows uClinux toolchains to build bootable zImages by setting r9
> >> to the beginning of the GOT when __uClinux__ is defined, allowing the
> >> decompressor's C functions to work correctly.
> >>
> >> Signed-off-by: Jonathan Austin <jonathan.austin@arm.com>
> >> ---
> >>
> >> One other possibility would be to specify -mno-single-pic-base when building
> >> the decompressor. This works around the problem, but forces the compiler to
> >> generate less optimal code.
> >
> > How "less optimal"? How much bigger/slower is it?
> > If not significant enough then going with -mno-single-pic-base might be
> > fine.
>
> Code that needs to access anything global will need to derive the location
> of the GOT for itself, but there's a possible upside there that there's an
> extra free register (r9 can be used as a general purpose register...)
We try to minimize those in order to perform the easy relocation trick
which requires no reference to global initialized data. Hence this in
the linker script:
/DISCARD/ : {
*(.ARM.exidx*)
*(.ARM.extab*)
/*
* Discard any r/w data - this produces a link error if we have any,
* which is required for PIC decompression. Local data generates
* GOTOFF relocations, which prevents it being relocated independently
* of the text/got segments.
*/
*(.data)
}
> The patch would look like:
> -----8<-------
> diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
> index 5cad8a6..afed28e 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -120,7 +120,7 @@ ORIG_CFLAGS := $(KBUILD_CFLAGS)
> KBUILD_CFLAGS = $(subst -pg, , $(ORIG_CFLAGS))
> endif
> -ccflags-y := -fpic -fno-builtin -I$(obj)
> +ccflags-y := -fpic -mno-single-pic-base -fno-builtin -I$(obj)
> asflags-y := -Wa,-march=all -DZIMAGE
> # Supply kernel BSS size to the decompressor via a linker symbol.
> ------>8---------
>
>
> I did a fairly crude benchmark - count how many instructions we need in
> order to finish decompressing the kernel...
>
> Setup r9 correctly: 129,976,282
> Use -mno-single-pic-base: 124,826,778
>
> (this was done using an R-class model and a magic semi-hosting call to pause
> the model at the end of the decompress_kernel function)
>
> So, it seems like the extra register means there's actually a 4% *win*
> in instruction terms from using -mno-single-pic-base
Looks like you have a winner.
Acked-by: Nicolas Pitre <nico@linaro.org>
> That said, I've still made some comments/amendments below...
>
> >
> >> arch/arm/boot/compressed/head.S | 4 ++++
> >> 1 file changed, 4 insertions(+)
> >>
> >> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> >> index fe4d9c3..4491e75 100644
> >> --- a/arch/arm/boot/compressed/head.S
> >> +++ b/arch/arm/boot/compressed/head.S
> >> @@ -410,6 +410,10 @@ wont_overwrite:
> >> * sp = stack pointer
> >> */
> >> orrs r1, r0, r5
> >> +#ifdef __uClinux__
> >> + mov r9, r11 @ PIC offset base register
> >> + addne r9, r9, r0 @ Also needs relocating
> >> +#endif
> >> beq not_relocated
> >
> > Please don't insert your code between the orrs and the beq as those two
> > go logically together.
>
> I'd initially done this in order to change only one site - as we need to
> set r9 and then add the offset I was using the condition code to test r0...
>
> However, this was silly - I think I can just do it in one instruction:
>
> add r9, r11, r0
>
> In the case that we're not relocated, r0 should be 0 anyway...
>
> >
> > In fact, the best location for this would probably be between the
> > wont_overwrite label and the comment that immediately follows it. And
> > then, those comments that follow until the branch into C code should be
> > updated accordingly.
>
>
> Okay, assuming I've understood you correctly, you're suggesting something
> like this:
>
> -----8<-------
>
> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> index fe4d9c3..d81efbd 100644
> --- a/arch/arm/boot/compressed/head.S
> +++ b/arch/arm/boot/compressed/head.S
> @@ -396,6 +396,9 @@ dtb_check_done:
> mov pc, r0
> wont_overwrite:
> +#ifdef __uClinux__
> + add r9, r11, r0 @ uClinux PIC offset base register
> +#endif
> /*
> * If delta is zero, we are running at the address we were linked at.
> * r0 = delta
> @@ -405,6 +408,7 @@ wont_overwrite:
> * r5 = appended dtb size (0 if not present)
> * r7 = architecture ID
> * r8 = atags pointer
> + * r9 = GOT start (for uClinux ABI), relocated
> * r11 = GOT start
> * r12 = GOT end
> * sp = stack pointer
> @@ -470,6 +474,7 @@ not_relocated: mov r0, #0
> * r4 = kernel execution address
> * r7 = architecture ID
> * r8 = atags pointer
> + * r9 = GOT start (for uClinux ABI)
> */
> mov r0, r4
> mov r1, sp @ malloc space above stack
> ------->8-----------
Yes, that's what I was suggesting.
> The question that now occurs is whether we should just set r9 whether or not
> we're using a uClinux toolchain - I don't think it is going to hurt as the
> arm-linux-gnueabi world can happily clobber it with no bad consequences...
>
> But after all this, it seems that just using -mno-single-pic base as in the patch
> above is best...
Indeed. As longas this option is compatible with all toolchains.
Nicolas
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH] arm: decompressor: initialize PIC offset base register for uClinux tools
2013-02-01 16:43 ` Jonathan Austin
2013-02-01 18:07 ` Nicolas Pitre
@ 2013-02-01 18:18 ` Russell King - ARM Linux
2013-02-04 12:00 ` Jonathan Austin
1 sibling, 1 reply; 9+ messages in thread
From: Russell King - ARM Linux @ 2013-02-01 18:18 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Feb 01, 2013 at 04:43:31PM +0000, Jonathan Austin wrote:
> Code that needs to access anything global will need to derive the location
> of the GOT for itself, but there's a possible upside there that there's an
> extra free register (r9 can be used as a general purpose register...)
>
> The patch would look like:
> -----8<-------
> diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
> index 5cad8a6..afed28e 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -120,7 +120,7 @@ ORIG_CFLAGS := $(KBUILD_CFLAGS)
> KBUILD_CFLAGS = $(subst -pg, , $(ORIG_CFLAGS))
> endif
> -ccflags-y := -fpic -fno-builtin -I$(obj)
> +ccflags-y := -fpic -mno-single-pic-base -fno-builtin -I$(obj)
> asflags-y := -Wa,-march=all -DZIMAGE
> # Supply kernel BSS size to the decompressor via a linker symbol.
> ------>8---------
>
>
> I did a fairly crude benchmark - count how many instructions we need in
> order to finish decompressing the kernel...
>
> Setup r9 correctly: 129,976,282
> Use -mno-single-pic-base: 124,826,778
>
> (this was done using an R-class model and a magic semi-hosting call to pause
> the model at the end of the decompress_kernel function)
>
> So, it seems like the extra register means there's actually a 4% *win*
> in instruction terms from using -mno-single-pic-base
Hmm. This is the opposite of what I'd expect. -msingle-pic-base says:
Treat the register used for PIC addressing as read-only, rather
than loading it in the prologue for each function. The run-time
system is responsible for initializing this register with an
appropriate value before execution begins.
which implies that we should be able to load it before calling the C
code (as you're doing) and then the compiler won't issue instructions
to reload that register.
Giving -mno-single-pic-base suggests that it would turn _off_ this
behaviour (which afaik - sensibly - is not by default enabled.)
So, I'm not sure I fully understand what's going on here.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH] arm: decompressor: initialize PIC offset base register for uClinux tools
2013-02-01 18:18 ` Russell King - ARM Linux
@ 2013-02-04 12:00 ` Jonathan Austin
2013-02-04 12:07 ` Russell King - ARM Linux
0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Austin @ 2013-02-04 12:00 UTC (permalink / raw)
To: linux-arm-kernel
Hi Russell,
On 01/02/13 18:18, Russell King - ARM Linux wrote:
> On Fri, Feb 01, 2013 at 04:43:31PM +0000, Jonathan Austin wrote:
>> Code that needs to access anything global will need to derive the location
>> of the GOT for itself, but there's a possible upside there that there's an
>> extra free register (r9 can be used as a general purpose register...)
>>
>> The patch would look like:
>> -----8<-------
>> diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
>> index 5cad8a6..afed28e 100644
>> --- a/arch/arm/boot/compressed/Makefile
>> +++ b/arch/arm/boot/compressed/Makefile
>> @@ -120,7 +120,7 @@ ORIG_CFLAGS := $(KBUILD_CFLAGS)
>> KBUILD_CFLAGS = $(subst -pg, , $(ORIG_CFLAGS))
>> endif
>> -ccflags-y := -fpic -fno-builtin -I$(obj)
>> +ccflags-y := -fpic -mno-single-pic-base -fno-builtin -I$(obj)
>> asflags-y := -Wa,-march=all -DZIMAGE
>> # Supply kernel BSS size to the decompressor via a linker symbol.
>> ------>8---------
>>
>>
>> I did a fairly crude benchmark - count how many instructions we need in
>> order to finish decompressing the kernel...
>>
>> Setup r9 correctly: 129,976,282
>> Use -mno-single-pic-base: 124,826,778
>>
>> (this was done using an R-class model and a magic semi-hosting call to pause
>> the model at the end of the decompress_kernel function)
>>
>> So, it seems like the extra register means there's actually a 4% *win*
>> in instruction terms from using -mno-single-pic-base
>
> Hmm. This is the opposite of what I'd expect. -msingle-pic-base says:
>
> Treat the register used for PIC addressing as read-only, rather
> than loading it in the prologue for each function. The run-time
> system is responsible for initializing this register with an
> appropriate value before execution begins.
>
> which implies that we should be able to load it before calling the C
> code (as you're doing) and then the compiler won't issue instructions
> to reload that register.
>
> Giving -mno-single-pic-base suggests that it would turn _off_ this
> behaviour (which afaik - sensibly - is not by default enabled.)
>
> So, I'm not sure I fully understand what's going on here.
You seem to have understood! Specifying -mno-single-pic-base
means the compiler *won't* expect r9 to point to the GOT, but also
means r9 is free as a general purpose register, the effect that I
believe gives the performance improvement in the decompresser.
Perhaps the context missing is that these are two independent patch
suggestions that achieve the same thing in different ways (that is,
they stop the decompresser running off to some incorrect memory location
because r9 isn't set-up). The -mno-single-pic base patch does it by not
using r9 as a PIC offset, and the 'initialise r9' patch does what it says
on the tin.
As you see, I benchmarked them and got the opposite result to what I
expected (IE -mno-songle-pic-base is quicker), so, based also on
Nicolas's Ack, would now champion a different patch to the original one
that I posted...
This is probably overkill, but here's a simple C example for comparison:
$cat pic.c
----------------
int foo;
int ret_foo()
{
return foo;
}
----------------
$arm-none-uclinux-uclibceabi-gcc -O2 -fPIC -S pic.c -o pic.s
$cat pic.s
-------------
[...]
ret_foo:
ldr r3, .L2
ldr r3, [r9, r3]
ldr r0, [r3, #0]
bx lr
.L3:
.align 2
.L2:
.word foo(GOT)
.size ret_foo, .-ret_foo
[...]
-------------
$arm-none-uclinux-uclibceabi-gcc -O2 -fPIC -mno-single-pic-base -S pic.c -o no-single-pic-base.s
$cat no-single-pic-base.s
-------------
[...]
ret_foo:
ldr r3, .L2
ldr r2, .L2+4
.LPIC0:
add r3, pc, r3
ldr r3, [r3, r2]
ldr r0, [r3, #0]
bx lr
.L3:
.align 2
.L2:
.word _GLOBAL_OFFSET_TABLE_-(.LPIC0+8)
.word foo(GOT)
[...]
-----------
So we have a 'penalty' of an extra ldr and add when we don't use a read-only
PIC base, but the win of a register free seems to trump that in the decompresser.
Does that clear things up, or did I miss the point of what wasn't clear to you?
Jonny
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH] arm: decompressor: initialize PIC offset base register for uClinux tools
2013-02-04 12:00 ` Jonathan Austin
@ 2013-02-04 12:07 ` Russell King - ARM Linux
2013-02-04 12:20 ` Jonathan Austin
0 siblings, 1 reply; 9+ messages in thread
From: Russell King - ARM Linux @ 2013-02-04 12:07 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Feb 04, 2013 at 12:00:00PM +0000, Jonathan Austin wrote:
> You seem to have understood! Specifying -mno-single-pic-base
> means the compiler *won't* expect r9 to point to the GOT, but also
> means r9 is free as a general purpose register, the effect that I
> believe gives the performance improvement in the decompresser.
>
> Perhaps the context missing is that these are two independent patch
> suggestions that achieve the same thing in different ways (that is,
> they stop the decompresser running off to some incorrect memory location
> because r9 isn't set-up). The -mno-single-pic base patch does it by not
> using r9 as a PIC offset, and the 'initialise r9' patch does what it says
> on the tin.
>
> As you see, I benchmarked them and got the opposite result to what I
> expected (IE -mno-songle-pic-base is quicker), so, based also on
> Nicolas's Ack, would now champion a different patch to the original one
> that I posted...
>
> This is probably overkill, but here's a simple C example for comparison:
> $cat pic.c
> ----------------
> int foo;
> int ret_foo()
> {
> return foo;
> }
> ----------------
> $arm-none-uclinux-uclibceabi-gcc -O2 -fPIC -S pic.c -o pic.s
> $cat pic.s
> -------------
> [...]
> ret_foo:
> ldr r3, .L2
> ldr r3, [r9, r3]
> ldr r0, [r3, #0]
> bx lr
> .L3:
> .align 2
> .L2:
> .word foo(GOT)
> .size ret_foo, .-ret_foo
Ah, so the problem is that the default for single-pic-base is different
with uclinux compilers from other compilers. Other compilers will
default to -mno-single-pic-base, but what your build above shows is that
for your compiler, your default is -msingle-pic-base.
So, passing -mno-single-pic-base means that you're actually _restoring_
the compiler behaviour that we're expecting for the decompressor.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH] arm: decompressor: initialize PIC offset base register for uClinux tools
2013-02-04 12:07 ` Russell King - ARM Linux
@ 2013-02-04 12:20 ` Jonathan Austin
2013-02-04 12:23 ` Russell King - ARM Linux
0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Austin @ 2013-02-04 12:20 UTC (permalink / raw)
To: linux-arm-kernel
On 04/02/13 12:07, Russell King - ARM Linux wrote:
> On Mon, Feb 04, 2013 at 12:00:00PM +0000, Jonathan Austin wrote:
>> You seem to have understood! Specifying -mno-single-pic-base
>> means the compiler *won't* expect r9 to point to the GOT, but also
>> means r9 is free as a general purpose register, the effect that I
>> believe gives the performance improvement in the decompresser.
>>
>> Perhaps the context missing is that these are two independent patch
>> suggestions that achieve the same thing in different ways (that is,
>> they stop the decompresser running off to some incorrect memory location
>> because r9 isn't set-up). The -mno-single-pic base patch does it by not
>> using r9 as a PIC offset, and the 'initialise r9' patch does what it says
>> on the tin.
>>
>> As you see, I benchmarked them and got the opposite result to what I
>> expected (IE -mno-songle-pic-base is quicker), so, based also on
>> Nicolas's Ack, would now champion a different patch to the original one
>> that I posted...
>>
>> This is probably overkill, but here's a simple C example for comparison:
>> $cat pic.c
>> ----------------
>> int foo;
>> int ret_foo()
>> {
>> return foo;
>> }
>> ----------------
>> $arm-none-uclinux-uclibceabi-gcc -O2 -fPIC -S pic.c -o pic.s
>> $cat pic.s
>> -------------
>> [...]
>> ret_foo:
>> ldr r3, .L2
>> ldr r3, [r9, r3]
>> ldr r0, [r3, #0]
>> bx lr
>> .L3:
>> .align 2
>> .L2:
>> .word foo(GOT)
>> .size ret_foo, .-ret_foo
>
> Ah, so the problem is that the default for single-pic-base is different
> with uclinux compilers from other compilers. Other compilers will
> default to -mno-single-pic-base, but what your build above shows is that
> for your compiler, your default is -msingle-pic-base.
>
> So, passing -mno-single-pic-base means that you're actually _restoring_
> the compiler behaviour that we're expecting for the decompressor.
>
Ahh, I see.
My experience is that my toolchain behaves much like most other uclinux
toolchains - I've just checked:
- Codesourcery
- A Pengutronix one for the M3
- An ARM one
And they all default to using r9.
So shall I put the -m*no*-single-pic-base one in to the patch system?
Jonny
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH] arm: decompressor: initialize PIC offset base register for uClinux tools
2013-02-04 12:20 ` Jonathan Austin
@ 2013-02-04 12:23 ` Russell King - ARM Linux
0 siblings, 0 replies; 9+ messages in thread
From: Russell King - ARM Linux @ 2013-02-04 12:23 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Feb 04, 2013 at 12:20:52PM +0000, Jonathan Austin wrote:
> My experience is that my toolchain behaves much like most other uclinux
> toolchains - I've just checked:
> - Codesourcery
> - A Pengutronix one for the M3
> - An ARM one
>
> And they all default to using r9.
>
> So shall I put the -m*no*-single-pic-base one in to the patch system?
Yup, though I don't think I'll be pushing it until the next merge window.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-02-04 12:23 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-29 17:29 [RFC PATCH] arm: decompressor: initialize PIC offset base register for uClinux tools Jonathan Austin
2013-01-29 20:13 ` Nicolas Pitre
2013-02-01 16:43 ` Jonathan Austin
2013-02-01 18:07 ` Nicolas Pitre
2013-02-01 18:18 ` Russell King - ARM Linux
2013-02-04 12:00 ` Jonathan Austin
2013-02-04 12:07 ` Russell King - ARM Linux
2013-02-04 12:20 ` Jonathan Austin
2013-02-04 12:23 ` Russell King - ARM Linux
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).