* [U-Boot] Relocation size penalty calculation
@ 2009-10-08 11:54 Graeme Russ
2009-10-08 14:14 ` Peter Tyser
2009-10-08 15:58 ` J. William Campbell
0 siblings, 2 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-08 11:54 UTC (permalink / raw)
To: u-boot
Out of curiosity, I wanted to see just how much of a size penalty I am
incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
the results (fixed width font will help - its space, not tab, formatted):
Section non-reloc reloc
---------------------------------------
.text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
.rodata 00005bad 000059d0
.interp n/a 00000013
.dynstr n/a 00000648
.hash n/a 00000428
.eh_frame 00003268 000034fc
.data 00000a6c 000001dc
.data.rel n/a 00000098
.data.rel.ro.local n/a 00000178
.data.rel.local n/a 000007e4
.got 00000000 000001f0
.got.plt n/a 0000000c
.rel.got n/a 000003e0
.rel.dyn n/a 00001228
.dynsym n/a 00000850
.dynamic n/a 00000080
.u_boot_cmd 000003c0 000003c0
.bss 00001a34 00001a34
.realmode 00000166 00000166
.bios 0000053e 0000053e
=======================================
Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
Its more than a 16% increase in size!!!
.text accounts for a little under half of the total bloat, and of that,
the crude dynamic loader accounts for only 341 bytes
Have any metrics been done for PPC?
Regards,
Graeme
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-08 11:54 [U-Boot] Relocation size penalty calculation Graeme Russ
@ 2009-10-08 14:14 ` Peter Tyser
2009-10-08 15:53 ` J. William Campbell
2009-10-08 15:58 ` J. William Campbell
1 sibling, 1 reply; 47+ messages in thread
From: Peter Tyser @ 2009-10-08 14:14 UTC (permalink / raw)
To: u-boot
On Thu, 2009-10-08 at 22:54 +1100, Graeme Russ wrote:
> Out of curiosity, I wanted to see just how much of a size penalty I am
> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
> the results (fixed width font will help - its space, not tab, formatted):
>
> Section non-reloc reloc
> ---------------------------------------
> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
> .rodata 00005bad 000059d0
> .interp n/a 00000013
> .dynstr n/a 00000648
> .hash n/a 00000428
> .eh_frame 00003268 000034fc
> .data 00000a6c 000001dc
> .data.rel n/a 00000098
> .data.rel.ro.local n/a 00000178
> .data.rel.local n/a 000007e4
> .got 00000000 000001f0
> .got.plt n/a 0000000c
> .rel.got n/a 000003e0
> .rel.dyn n/a 00001228
> .dynsym n/a 00000850
> .dynamic n/a 00000080
> .u_boot_cmd 000003c0 000003c0
> .bss 00001a34 00001a34
> .realmode 00000166 00000166
> .bios 0000053e 0000053e
> =======================================
> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
>
> Its more than a 16% increase in size!!!
>
> .text accounts for a little under half of the total bloat, and of that,
> the crude dynamic loader accounts for only 341 bytes
>
> Have any metrics been done for PPC?
Things actually improve a little bit when we use -mrelocatable and get
rid of all the manual "+= gd->reloc_off" fixups:
1) Top of mainline on XPedite5370:
text data bss dec hex filename
308612 24488 33172 366272 596c0 u-boot
2) Top of "reloc" branch on XPedite5370 (ie -mrelocatable):
text data bss dec hex filename
303704 28644 33156 365504 593c0 u-boot
For fun:
3) #2 but with s/-mrelocatable/-fpic/ (probably doesn't boot):
text data bss dec hex filename
303704 24472 33156 361332 58374 u-boot
There may be some other changes that affect the size between mainline
and "reloc", but their sizes are in the same general ballpark.
Best,
Peter
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-08 14:14 ` Peter Tyser
@ 2009-10-08 15:53 ` J. William Campbell
2009-10-08 16:15 ` Peter Tyser
0 siblings, 1 reply; 47+ messages in thread
From: J. William Campbell @ 2009-10-08 15:53 UTC (permalink / raw)
To: u-boot
Peter Tyser wrote:
> On Thu, 2009-10-08 at 22:54 +1100, Graeme Russ wrote:
>
>> Out of curiosity, I wanted to see just how much of a size penalty I am
>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>> the results (fixed width font will help - its space, not tab, formatted):
>>
>> Section non-reloc reloc
>> ---------------------------------------
>> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
>> .rodata 00005bad 000059d0
>> .interp n/a 00000013
>> .dynstr n/a 00000648
>> .hash n/a 00000428
>> .eh_frame 00003268 000034fc
>> .data 00000a6c 000001dc
>> .data.rel n/a 00000098
>> .data.rel.ro.local n/a 00000178
>> .data.rel.local n/a 000007e4
>> .got 00000000 000001f0
>> .got.plt n/a 0000000c
>> .rel.got n/a 000003e0
>> .rel.dyn n/a 00001228
>> .dynsym n/a 00000850
>> .dynamic n/a 00000080
>> .u_boot_cmd 000003c0 000003c0
>> .bss 00001a34 00001a34
>> .realmode 00000166 00000166
>> .bios 0000053e 0000053e
>> =======================================
>> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
>>
>> Its more than a 16% increase in size!!!
>>
>> .text accounts for a little under half of the total bloat, and of that,
>> the crude dynamic loader accounts for only 341 bytes
>>
>> Have any metrics been done for PPC?
>>
>
> Things actually improve a little bit when we use -mrelocatable and get
> rid of all the manual "+= gd->reloc_off" fixups:
>
> 1) Top of mainline on XPedite5370:
> text data bss dec hex filename
> 308612 24488 33172 366272 596c0 u-boot
>
> 2) Top of "reloc" branch on XPedite5370 (ie -mrelocatable):
> text data bss dec hex filename
> 303704 28644 33156 365504 593c0 u-boot
>
>
Hi Peter,
Just to be clear, the total text+data length of u-boot with the
"manual" relocations (#1) is LARGER than the text+data length of u-boot
with the "manual" relocations removed and the necessary centralized
relocation code added, along with any additional data sections required
by -mrelocateable (#2), by 768 (dec) bytes? And both cases (1 and 2)
work equivalently?
Best Regards,
Bill Campbell.
> For fun:
> 3) #2 but with s/-mrelocatable/-fpic/ (probably doesn't boot):
> text data bss dec hex filename
> 303704 24472 33156 361332 58374 u-boot
>
>
> There may be some other changes that affect the size between mainline
> and "reloc", but their sizes are in the same general ballpark.
>
> Best,
> Peter
>
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-08 11:54 [U-Boot] Relocation size penalty calculation Graeme Russ
2009-10-08 14:14 ` Peter Tyser
@ 2009-10-08 15:58 ` J. William Campbell
2009-10-08 20:58 ` Graeme Russ
1 sibling, 1 reply; 47+ messages in thread
From: J. William Campbell @ 2009-10-08 15:58 UTC (permalink / raw)
To: u-boot
Graeme Russ wrote:
> Out of curiosity, I wanted to see just how much of a size penalty I am
> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
> the results (fixed width font will help - its space, not tab, formatted):
>
> Section non-reloc reloc
> ---------------------------------------
> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
> .rodata 00005bad 000059d0
> .interp n/a 00000013
> .dynstr n/a 00000648
> .hash n/a 00000428
> .eh_frame 00003268 000034fc
> .data 00000a6c 000001dc
> .data.rel n/a 00000098
> .data.rel.ro.local n/a 00000178
> .data.rel.local n/a 000007e4
> .got 00000000 000001f0
> .got.plt n/a 0000000c
> .rel.got n/a 000003e0
> .rel.dyn n/a 00001228
> .dynsym n/a 00000850
> .dynamic n/a 00000080
> .u_boot_cmd 000003c0 000003c0
> .bss 00001a34 00001a34
> .realmode 00000166 00000166
> .bios 0000053e 0000053e
> =======================================
> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
>
> Its more than a 16% increase in size!!!
>
> .text accounts for a little under half of the total bloat, and of that,
> the crude dynamic loader accounts for only 341 bytes
>
Hi Graeme,
I would be interested in a third option (column), the x86 build
with just -mrelocateable but NOT -fpic. It will not be definitive
because there will be extra code that references the GOT and missing
code to do some of the relocation, but it would still be interesting.
Best Regards,
Bill Campbell
> Have any metrics been done for PPC?
>
> Regards,
>
> Graeme
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-08 15:53 ` J. William Campbell
@ 2009-10-08 16:15 ` Peter Tyser
2009-10-08 16:50 ` J. William Campbell
0 siblings, 1 reply; 47+ messages in thread
From: Peter Tyser @ 2009-10-08 16:15 UTC (permalink / raw)
To: u-boot
On Thu, 2009-10-08 at 08:53 -0700, J. William Campbell wrote:
> Peter Tyser wrote:
> > On Thu, 2009-10-08 at 22:54 +1100, Graeme Russ wrote:
> >
> >> Out of curiosity, I wanted to see just how much of a size penalty I am
> >> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
> >> the results (fixed width font will help - its space, not tab, formatted):
> >>
> >> Section non-reloc reloc
> >> ---------------------------------------
> >> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
> >> .rodata 00005bad 000059d0
> >> .interp n/a 00000013
> >> .dynstr n/a 00000648
> >> .hash n/a 00000428
> >> .eh_frame 00003268 000034fc
> >> .data 00000a6c 000001dc
> >> .data.rel n/a 00000098
> >> .data.rel.ro.local n/a 00000178
> >> .data.rel.local n/a 000007e4
> >> .got 00000000 000001f0
> >> .got.plt n/a 0000000c
> >> .rel.got n/a 000003e0
> >> .rel.dyn n/a 00001228
> >> .dynsym n/a 00000850
> >> .dynamic n/a 00000080
> >> .u_boot_cmd 000003c0 000003c0
> >> .bss 00001a34 00001a34
> >> .realmode 00000166 00000166
> >> .bios 0000053e 0000053e
> >> =======================================
> >> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
> >>
> >> Its more than a 16% increase in size!!!
> >>
> >> .text accounts for a little under half of the total bloat, and of that,
> >> the crude dynamic loader accounts for only 341 bytes
> >>
> >> Have any metrics been done for PPC?
> >>
> >
> > Things actually improve a little bit when we use -mrelocatable and get
> > rid of all the manual "+= gd->reloc_off" fixups:
> >
> > 1) Top of mainline on XPedite5370:
> > text data bss dec hex filename
> > 308612 24488 33172 366272 596c0 u-boot
> >
> > 2) Top of "reloc" branch on XPedite5370 (ie -mrelocatable):
> > text data bss dec hex filename
> > 303704 28644 33156 365504 593c0 u-boot
> >
> >
> Hi Peter,
> Just to be clear, the total text+data length of u-boot with the
> "manual" relocations (#1) is LARGER than the text+data length of u-boot
> with the "manual" relocations removed and the necessary centralized
> relocation code added, along with any additional data sections required
> by -mrelocateable (#2), by 768 (dec) bytes?
Hi Bill,
Doah, looks like I chose a bad board as an example. The XPedite5370
already had -mrelocatable defined in its own
board/xes/xpedite5370/config.mk in mainline, so the above comparison
should be ignored as both builds used -mrelocatable.
Here's some *real* results from the MPC8548CDS:
1) Top of mainline:
text data bss dec hex filename
219968 17052 22992 260012 3f7ac u-boot
2) Top of "reloc" branch (ie -mrelocatable)
text data bss dec hex filename
219192 20640 22980 262812 4029c u-boot
So the reloc branch is 2.7K bigger for the MPC8548CDS.
Best,
Peter
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-08 16:15 ` Peter Tyser
@ 2009-10-08 16:50 ` J. William Campbell
0 siblings, 0 replies; 47+ messages in thread
From: J. William Campbell @ 2009-10-08 16:50 UTC (permalink / raw)
To: u-boot
Peter Tyser wrote:
> On Thu, 2009-10-08 at 08:53 -0700, J. William Campbell wrote:
>
>> Peter Tyser wrote:
>>
>>> On Thu, 2009-10-08 at 22:54 +1100, Graeme Russ wrote:
>>>
>>>
>>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>>>> the results (fixed width font will help - its space, not tab, formatted):
>>>>
>>>> Section non-reloc reloc
>>>> ---------------------------------------
>>>> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
>>>> .rodata 00005bad 000059d0
>>>> .interp n/a 00000013
>>>> .dynstr n/a 00000648
>>>> .hash n/a 00000428
>>>> .eh_frame 00003268 000034fc
>>>> .data 00000a6c 000001dc
>>>> .data.rel n/a 00000098
>>>> .data.rel.ro.local n/a 00000178
>>>> .data.rel.local n/a 000007e4
>>>> .got 00000000 000001f0
>>>> .got.plt n/a 0000000c
>>>> .rel.got n/a 000003e0
>>>> .rel.dyn n/a 00001228
>>>> .dynsym n/a 00000850
>>>> .dynamic n/a 00000080
>>>> .u_boot_cmd 000003c0 000003c0
>>>> .bss 00001a34 00001a34
>>>> .realmode 00000166 00000166
>>>> .bios 0000053e 0000053e
>>>> =======================================
>>>> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
>>>>
>>>> Its more than a 16% increase in size!!!
>>>>
>>>> .text accounts for a little under half of the total bloat, and of that,
>>>> the crude dynamic loader accounts for only 341 bytes
>>>>
>>>> Have any metrics been done for PPC?
>>>>
>>>>
>>> Things actually improve a little bit when we use -mrelocatable and get
>>> rid of all the manual "+= gd->reloc_off" fixups:
>>>
>>> 1) Top of mainline on XPedite5370:
>>> text data bss dec hex filename
>>> 308612 24488 33172 366272 596c0 u-boot
>>>
>>> 2) Top of "reloc" branch on XPedite5370 (ie -mrelocatable):
>>> text data bss dec hex filename
>>> 303704 28644 33156 365504 593c0 u-boot
>>>
>>>
>>>
>> Hi Peter,
>> Just to be clear, the total text+data length of u-boot with the
>> "manual" relocations (#1) is LARGER than the text+data length of u-boot
>> with the "manual" relocations removed and the necessary centralized
>> relocation code added, along with any additional data sections required
>> by -mrelocateable (#2), by 768 (dec) bytes?
>>
>
> Hi Bill,
> Doah, looks like I chose a bad board as an example. The XPedite5370
> already had -mrelocatable defined in its own
> board/xes/xpedite5370/config.mk in mainline, so the above comparison
> should be ignored as both builds used -mrelocatable.
>
> Here's some *real* results from the MPC8548CDS:
> 1) Top of mainline:
> text data bss dec hex filename
> 219968 17052 22992 260012 3f7ac u-boot
>
> 2) Top of "reloc" branch (ie -mrelocatable)
> text data bss dec hex filename
> 219192 20640 22980 262812 4029c u-boot
>
> So the reloc branch is 2.7K bigger for the MPC8548CDS.
>
Hi Peter,
OK, that's more like it! A 1.2 % size increase in ROM seems like a
very small price to pay for a truly relocatable u-boot image that will
run on any size memory without the programmer having to actively worry
about what may need relocating as code is written. . Also, it should be
noted that the size increase in 2) is mostly in relocation segments
that do not need to be copied into ram, so the ram footprint should be
smaller for 2) than 1). The relocation code itself could also be placed
is a segment that is not copied into ram, although that may be more
trouble than it is worth.
I am looking forward to Graeme's results with the 386. I expect
that it will not be quite so favorable, perhaps a 4 or 5% size increase
for -mrelocatable over an absolute build. However, -mrelocatable vs.
-fpic may be comparable, with -mrelocatable actually winning. But then
again, I could be totally wrong!
Best Regards,
Bill Campbell
> Best,
> Peter
>
>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-08 15:58 ` J. William Campbell
@ 2009-10-08 20:58 ` Graeme Russ
2009-10-08 21:23 ` Wolfgang Denk
2009-10-08 22:27 ` J. William Campbell
0 siblings, 2 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-08 20:58 UTC (permalink / raw)
To: u-boot
On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
<jwilliamcampbell@comcast.net> wrote:
> Graeme Russ wrote:
>>
>> Out of curiosity, I wanted to see just how much of a size penalty I am
>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>> the results (fixed width font will help - its space, not tab, formatted):
>>
>> Section non-reloc reloc
>> ---------------------------------------
>> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
>> .rodata 00005bad 000059d0
>> .interp n/a 00000013
>> .dynstr n/a 00000648
>> .hash n/a 00000428
>> .eh_frame 00003268 000034fc
>> .data 00000a6c 000001dc
>> .data.rel n/a 00000098
>> .data.rel.ro.local n/a 00000178
>> .data.rel.local n/a 000007e4
>> .got 00000000 000001f0
>> .got.plt n/a 0000000c
>> .rel.got n/a 000003e0
>> .rel.dyn n/a 00001228
>> .dynsym n/a 00000850
>> .dynamic n/a 00000080
>> .u_boot_cmd 000003c0 000003c0
>> .bss 00001a34 00001a34
>> .realmode 00000166 00000166
>> .bios 0000053e 0000053e
>> =======================================
>> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
>>
>> Its more than a 16% increase in size!!!
>>
>> .text accounts for a little under half of the total bloat, and of that,
>> the crude dynamic loader accounts for only 341 bytes
>>
>
> Hi Graeme,
> I would be interested in a third option (column), the x86 build with
> just -mrelocateable but NOT -fpic. It will not be definitive because there
> will be extra code that references the GOT and missing code to do some of
> the relocation, but it would still be interesting.
x86 does not have -mrelocatable. This is a PPC only option :(
>
> Best Regards,
> Bill Campbell
>>
>> Have any metrics been done for PPC?
>>
>> Regards,
>>
>> Graeme
Once the reloc branch has been merged, how many arches are left which do
not support relocation?
Regards,
Graeme
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-08 20:58 ` Graeme Russ
@ 2009-10-08 21:23 ` Wolfgang Denk
2009-10-08 22:02 ` Graeme Russ
2009-10-08 22:27 ` J. William Campbell
1 sibling, 1 reply; 47+ messages in thread
From: Wolfgang Denk @ 2009-10-08 21:23 UTC (permalink / raw)
To: u-boot
Dear Graeme Russ,
In message <d66caabb0910081358h5b013922tf7f9dce4cce41c64@mail.gmail.com> you wrote:
>
>
> Once the reloc branch has been merged, how many arches are left which do
> not support relocation?
All but PPC ?
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
There comes to all races an ultimate crisis which you have yet to
face .... One day our minds became so powerful we dared think of
ourselves as gods.
-- Sargon, "Return to Tomorrow", stardate 4768.3
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-08 21:23 ` Wolfgang Denk
@ 2009-10-08 22:02 ` Graeme Russ
2009-10-08 22:20 ` Peter Tyser
0 siblings, 1 reply; 47+ messages in thread
From: Graeme Russ @ 2009-10-08 22:02 UTC (permalink / raw)
To: u-boot
On Fri, Oct 9, 2009 at 8:23 AM, Wolfgang Denk <wd@denx.de> wrote:
> Dear Graeme Russ,
>
> In message <d66caabb0910081358h5b013922tf7f9dce4cce41c64@mail.gmail.com> you wrote:
>>
>>
>> Once the reloc branch has been merged, how many arches are left which do
>> not support relocation?
>
> All but PPC ?
Hmm, so commit 0630535e2d062dd73c1ceca5c6125c86d1127a49 is all about
removing code that is not used because these arches do not do any
relocation at all?
So ultimately, what we are looking at is the complete and utter
removal of any code which references a relocation adjustment in lieu
of each arch either:
a) Execute in Place from Flash, or;
b) Setting a fixed TEXT_BASE at a known RAM location and copying
the contents of Flash to RAM, or;
c) Implementing full Relocation
>
> Best regards,
>
> Wolfgang Denk
>
Regards,
Graeme
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-08 22:02 ` Graeme Russ
@ 2009-10-08 22:20 ` Peter Tyser
2009-10-09 1:25 ` Mike Frysinger
2009-10-09 1:43 ` Graeme Russ
0 siblings, 2 replies; 47+ messages in thread
From: Peter Tyser @ 2009-10-08 22:20 UTC (permalink / raw)
To: u-boot
On Fri, 2009-10-09 at 09:02 +1100, Graeme Russ wrote:
> On Fri, Oct 9, 2009 at 8:23 AM, Wolfgang Denk <wd@denx.de> wrote:
> > Dear Graeme Russ,
> >
> > In message <d66caabb0910081358h5b013922tf7f9dce4cce41c64@mail.gmail.com> you wrote:
> >>
> >>
> >> Once the reloc branch has been merged, how many arches are left which do
> >> not support relocation?
> >
> > All but PPC ?
>
> Hmm, so commit 0630535e2d062dd73c1ceca5c6125c86d1127a49 is all about
> removing code that is not used because these arches do not do any
> relocation at all?
I sent that patch/RFC after noticing none of those architectures
performed manual relocation fixups, thus they could save some code space
by defining CONFIG_RELOC_FIXUP_WORKS. Similarly the gd->reloc_off field
was no longer needed for them.
I'm not familiar with if or how those architectures are relocating, just
that they didn't need relocation fixups. So that was the logic...
> So ultimately, what we are looking at is the complete and utter
> removal of any code which references a relocation adjustment in lieu
> of each arch either:
>
> a) Execute in Place from Flash, or;
> b) Setting a fixed TEXT_BASE at a known RAM location and copying
> the contents of Flash to RAM, or;
> c) Implementing full Relocation
d) Leaving those architectures the way they are now
Could be added if a,b,c won't work for some reason too.
I think it would be great to remove any manual relocation adjustments in
the long run. This isn't strictly necessary though, as we can still
have manual relocations littering the code - its just a bit dirty and
prone to issues in the long run.
So my vote would be to shoot for c) for all arches, but I have no idea
what impact that would have on them:)
Best,
Peter
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-08 20:58 ` Graeme Russ
2009-10-08 21:23 ` Wolfgang Denk
@ 2009-10-08 22:27 ` J. William Campbell
2009-10-08 22:39 ` Graeme Russ
1 sibling, 1 reply; 47+ messages in thread
From: J. William Campbell @ 2009-10-08 22:27 UTC (permalink / raw)
To: u-boot
Graeme Russ wrote:
> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
> <jwilliamcampbell@comcast.net> wrote:
>
>> Graeme Russ wrote:
>>
>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>>> the results (fixed width font will help - its space, not tab, formatted):
>>>
>>> Section non-reloc reloc
>>> ---------------------------------------
>>> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
>>> .rodata 00005bad 000059d0
>>> .interp n/a 00000013
>>> .dynstr n/a 00000648
>>> .hash n/a 00000428
>>> .eh_frame 00003268 000034fc
>>> .data 00000a6c 000001dc
>>> .data.rel n/a 00000098
>>> .data.rel.ro.local n/a 00000178
>>> .data.rel.local n/a 000007e4
>>> .got 00000000 000001f0
>>> .got.plt n/a 0000000c
>>> .rel.got n/a 000003e0
>>> .rel.dyn n/a 00001228
>>> .dynsym n/a 00000850
>>> .dynamic n/a 00000080
>>> .u_boot_cmd 000003c0 000003c0
>>> .bss 00001a34 00001a34
>>> .realmode 00000166 00000166
>>> .bios 0000053e 0000053e
>>> =======================================
>>> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
>>>
>>> Its more than a 16% increase in size!!!
>>>
>>> .text accounts for a little under half of the total bloat, and of that,
>>> the crude dynamic loader accounts for only 341 bytes
>>>
>>>
>> Hi Graeme,
>> I would be interested in a third option (column), the x86 build with
>> just -mrelocateable but NOT -fpic. It will not be definitive because there
>> will be extra code that references the GOT and missing code to do some of
>> the relocation, but it would still be interesting.
>>
>
> x86 does not have -mrelocatable. This is a PPC only option :(
>
Hi Graeme,
You are unfortunately correct. However, I wonder if we can
get essentially the same result by executing the final ld step with the
--emit-relocs switch included. This may also include some "extra"
sections that we would want to strip out, but if it works, it could give
all ELF-based systems a way to a relocatable u-boot.
Best Regards,
Bill Campbell
**
>
>
>> Best Regards,
>> Bill Campbell
>>
>>> Have any metrics been done for PPC?
>>>
>>> Regards,
>>>
>>> Graeme
>>>
>
> Once the reloc branch has been merged, how many arches are left which do
> not support relocation?
>
> Regards,
>
> Graeme
>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-08 22:27 ` J. William Campbell
@ 2009-10-08 22:39 ` Graeme Russ
2009-10-08 23:12 ` Joakim Tjernlund
0 siblings, 1 reply; 47+ messages in thread
From: Graeme Russ @ 2009-10-08 22:39 UTC (permalink / raw)
To: u-boot
On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
<jwilliamcampbell@comcast.net> wrote:
> Graeme Russ wrote:
>>
>> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
>> <jwilliamcampbell@comcast.net> wrote:
>>
>>>
>>> Graeme Russ wrote:
>>>
>>>>
>>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>>>> the results (fixed width font will help - its space, not tab,
>>>> formatted):
>>>>
>>>> Section non-reloc reloc
>>>> ---------------------------------------
>>>> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
>>>> .rodata 00005bad 000059d0
>>>> .interp n/a 00000013
>>>> .dynstr n/a 00000648
>>>> .hash n/a 00000428
>>>> .eh_frame 00003268 000034fc
>>>> .data 00000a6c 000001dc
>>>> .data.rel n/a 00000098
>>>> .data.rel.ro.local n/a 00000178
>>>> .data.rel.local n/a 000007e4
>>>> .got 00000000 000001f0
>>>> .got.plt n/a 0000000c
>>>> .rel.got n/a 000003e0
>>>> .rel.dyn n/a 00001228
>>>> .dynsym n/a 00000850
>>>> .dynamic n/a 00000080
>>>> .u_boot_cmd 000003c0 000003c0
>>>> .bss 00001a34 00001a34
>>>> .realmode 00000166 00000166
>>>> .bios 0000053e 0000053e
>>>> =======================================
>>>> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
>>>>
>>>> Its more than a 16% increase in size!!!
>>>>
>>>> .text accounts for a little under half of the total bloat, and of that,
>>>> the crude dynamic loader accounts for only 341 bytes
>>>>
>>>>
>>>
>>> Hi Graeme,
>>> I would be interested in a third option (column), the x86 build with
>>> just -mrelocateable but NOT -fpic. It will not be definitive because
>>> there
>>> will be extra code that references the GOT and missing code to do some of
>>> the relocation, but it would still be interesting.
>>>
>>
>> x86 does not have -mrelocatable. This is a PPC only option :(
>>
>
> Hi Graeme,
> You are unfortunately correct. However, I wonder if we can get
> essentially the same result by executing the final ld step with the
> --emit-relocs switch included. This may also include some "extra" sections
> that we would want to strip out, but if it works, it could give all
> ELF-based systems a way to a relocatable u-boot.
>
I don't think --emit-relocs is necessary with -pic. I haven't gone through
all the permutations to see if there is a smaller option, but gcc -fpic and
ld -pie creates enough information to perform relocation on the x86
platform
Regards,
Graeme
> Best Regards,
> Bill Campbell
> **
>>
>>
>>>
>>> Best Regards,
>>> Bill Campbell
>>>
>>>>
>>>> Have any metrics been done for PPC?
>>>>
>>>> Regards,
>>>>
>>>> Graeme
>>>>
>>
>> Once the reloc branch has been merged, how many arches are left which do
>> not support relocation?
>>
>> Regards,
>>
>> Graeme
>>
>>
>>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-08 22:39 ` Graeme Russ
@ 2009-10-08 23:12 ` Joakim Tjernlund
2009-10-09 0:09 ` J. William Campbell
2009-10-10 4:43 ` Graeme Russ
0 siblings, 2 replies; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-08 23:12 UTC (permalink / raw)
To: u-boot
>
> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
> <jwilliamcampbell@comcast.net> wrote:
> > Graeme Russ wrote:
> >>
> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
> >> <jwilliamcampbell@comcast.net> wrote:
> >>
> >>>
> >>> Graeme Russ wrote:
> >>>
> >>>>
> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
> >>>> the results (fixed width font will help - its space, not tab,
> >>>> formatted):
> >>>>
> >>>> Section non-reloc reloc
> >>>> ---------------------------------------
> >>>> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
> >>>> .rodata 00005bad 000059d0
> >>>> .interp n/a 00000013
> >>>> .dynstr n/a 00000648
> >>>> .hash n/a 00000428
> >>>> .eh_frame 00003268 000034fc
> >>>> .data 00000a6c 000001dc
> >>>> .data.rel n/a 00000098
> >>>> .data.rel.ro.local n/a 00000178
> >>>> .data.rel.local n/a 000007e4
> >>>> .got 00000000 000001f0
> >>>> .got.plt n/a 0000000c
> >>>> .rel.got n/a 000003e0
> >>>> .rel.dyn n/a 00001228
> >>>> .dynsym n/a 00000850
> >>>> .dynamic n/a 00000080
> >>>> .u_boot_cmd 000003c0 000003c0
> >>>> .bss 00001a34 00001a34
> >>>> .realmode 00000166 00000166
> >>>> .bios 0000053e 0000053e
> >>>> =======================================
> >>>> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
> >>>>
> >>>> Its more than a 16% increase in size!!!
> >>>>
> >>>> .text accounts for a little under half of the total bloat, and of that,
> >>>> the crude dynamic loader accounts for only 341 bytes
> >>>>
> >>>>
> >>>
> >>> Hi Graeme,
> >>> I would be interested in a third option (column), the x86 build with
> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
> >>> there
> >>> will be extra code that references the GOT and missing code to do some of
> >>> the relocation, but it would still be interesting.
> >>>
> >>
> >> x86 does not have -mrelocatable. This is a PPC only option :(
> >>
> >
> > Hi Graeme,
> > You are unfortunately correct. However, I wonder if we can get
> > essentially the same result by executing the final ld step with the
> > --emit-relocs switch included. This may also include some "extra" sections
> > that we would want to strip out, but if it works, it could give all
> > ELF-based systems a way to a relocatable u-boot.
> >
>
> I don't think --emit-relocs is necessary with -pic. I haven't gone through
> all the permutations to see if there is a smaller option, but gcc -fpic and
> ld -pie creates enough information to perform relocation on the x86
> platform
Try -fvisibility=hidden
Jocke
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-08 23:12 ` Joakim Tjernlund
@ 2009-10-09 0:09 ` J. William Campbell
2009-10-10 4:43 ` Graeme Russ
1 sibling, 0 replies; 47+ messages in thread
From: J. William Campbell @ 2009-10-09 0:09 UTC (permalink / raw)
To: u-boot
Joakim Tjernlund wrote:
>> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
>> <jwilliamcampbell@comcast.net> wrote:
>>
>>> Graeme Russ wrote:
>>>
>>>> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
>>>> <jwilliamcampbell@comcast.net> wrote:
>>>>
>>>>
>>>>> Graeme Russ wrote:
>>>>>
>>>>>
>>>>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>>>>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>>>>>> the results (fixed width font will help - its space, not tab,
>>>>>> formatted):
>>>>>>
>>>>>> Section non-reloc reloc
>>>>>> ---------------------------------------
>>>>>> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
>>>>>> .rodata 00005bad 000059d0
>>>>>> .interp n/a 00000013
>>>>>> .dynstr n/a 00000648
>>>>>> .hash n/a 00000428
>>>>>> .eh_frame 00003268 000034fc
>>>>>> .data 00000a6c 000001dc
>>>>>> .data.rel n/a 00000098
>>>>>> .data.rel.ro.local n/a 00000178
>>>>>> .data.rel.local n/a 000007e4
>>>>>> .got 00000000 000001f0
>>>>>> .got.plt n/a 0000000c
>>>>>> .rel.got n/a 000003e0
>>>>>> .rel.dyn n/a 00001228
>>>>>> .dynsym n/a 00000850
>>>>>> .dynamic n/a 00000080
>>>>>> .u_boot_cmd 000003c0 000003c0
>>>>>> .bss 00001a34 00001a34
>>>>>> .realmode 00000166 00000166
>>>>>> .bios 0000053e 0000053e
>>>>>> =======================================
>>>>>> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
>>>>>>
>>>>>> Its more than a 16% increase in size!!!
>>>>>>
>>>>>> .text accounts for a little under half of the total bloat, and of that,
>>>>>> the crude dynamic loader accounts for only 341 bytes
>>>>>>
>>>>>>
>>>>>>
>>>>> Hi Graeme,
>>>>> I would be interested in a third option (column), the x86 build with
>>>>> just -mrelocateable but NOT -fpic. It will not be definitive because
>>>>> there
>>>>> will be extra code that references the GOT and missing code to do some of
>>>>> the relocation, but it would still be interesting.
>>>>>
>>>>>
>>>> x86 does not have -mrelocatable. This is a PPC only option :(
>>>>
>>>>
>>> Hi Graeme,
>>> You are unfortunately correct. However, I wonder if we can get
>>> essentially the same result by executing the final ld step with the
>>> --emit-relocs switch included. This may also include some "extra" sections
>>> that we would want to strip out, but if it works, it could give all
>>> ELF-based systems a way to a relocatable u-boot.
>>>
>>>
>> I don't think --emit-relocs is necessary with -pic. I haven't gone through
>> all the permutations to see if there is a smaller option, but gcc -fpic and
>> ld -pie creates enough information to perform relocation on the x86
>> platform
>>
>
>
It is true that --emit-relocs is not required when -pic and -pie are
used instead. However, pic and pie are designed to allow shared code
(libraries) to appear at different logical addresses in several
programs without altering the text. This is grand overkill for what we
need, which is the ability to relocate the code. The -pic and -pie code
will be larger than the code without pic and pie. How much larger is a
good question. On the PPC, it is larger but not much larger, because
there are lots of registers available and one is almost for sure got (no
pun intended) the magic relocation constant(s) in it. On the 386 with
many fewer registers, pic and pie will cause the code to be
percentage-wise larger than on the PPC. Thus avoiding pic and pie is a
Good Thing in most cases.
> Try -fvisibility=hidden
>
I assume the -fvisibility=hidden is suggested in order to reduce
(eliminate) the symbol table from the output, which we don't need
because there are assumed to be no undefined symbols in our final ld. If
that works, great! I was assuming we might need a custom "strip" program
to delete any sections that we don't need, but this sounds easier if it
gets them all.
Best Regards,
Bill Campbell
> Jocke
>
>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-08 22:20 ` Peter Tyser
@ 2009-10-09 1:25 ` Mike Frysinger
2009-10-09 1:43 ` Graeme Russ
1 sibling, 0 replies; 47+ messages in thread
From: Mike Frysinger @ 2009-10-09 1:25 UTC (permalink / raw)
To: u-boot
On Thursday 08 October 2009 18:20:18 Peter Tyser wrote:
> On Fri, 2009-10-09 at 09:02 +1100, Graeme Russ wrote:
> > On Fri, Oct 9, 2009 at 8:23 AM, Wolfgang Denk <wd@denx.de> wrote:
> > > Graeme Russ wrote:
> > >> Once the reloc branch has been merged, how many arches are left which
> > >> do not support relocation?
> > >
> > > All but PPC ?
> >
> > Hmm, so commit 0630535e2d062dd73c1ceca5c6125c86d1127a49 is all about
> > removing code that is not used because these arches do not do any
> > relocation at all?
>
> I sent that patch/RFC after noticing none of those architectures
> performed manual relocation fixups, thus they could save some code space
> by defining CONFIG_RELOC_FIXUP_WORKS. Similarly the gd->reloc_off field
> was no longer needed for them.
>
> I'm not familiar with if or how those architectures are relocating, just
> that they didn't need relocation fixups. So that was the logic...
the usage in the Blackfin port is most likely a copy & paste of existing code.
deleting malloc_bin_reloc() from lib_blackfin/board.c and adding
CONFIG_RELOC_FIXUP_WORKS results in a working boot. ive never really looked
into relocation as no one has asked for it.
-mike
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
Url : http://lists.denx.de/pipermail/u-boot/attachments/20091008/d42e7117/attachment.pgp
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-08 22:20 ` Peter Tyser
2009-10-09 1:25 ` Mike Frysinger
@ 2009-10-09 1:43 ` Graeme Russ
1 sibling, 0 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-09 1:43 UTC (permalink / raw)
To: u-boot
On Fri, Oct 9, 2009 at 9:20 AM, Peter Tyser <ptyser@xes-inc.com> wrote:
> On Fri, 2009-10-09 at 09:02 +1100, Graeme Russ wrote:
>> On Fri, Oct 9, 2009 at 8:23 AM, Wolfgang Denk <wd@denx.de> wrote:
>> > Dear Graeme Russ,
>> >
>> > In message <d66caabb0910081358h5b013922tf7f9dce4cce41c64@mail.gmail.com> you wrote:
>> >>
>> >>
>> >> Once the reloc branch has been merged, how many arches are left which do
>> >> not support relocation?
>> >
>> > All but PPC ?
>>
>> Hmm, so commit 0630535e2d062dd73c1ceca5c6125c86d1127a49 is all about
>> removing code that is not used because these arches do not do any
>> relocation at all?
>
> I sent that patch/RFC after noticing none of those architectures
> performed manual relocation fixups, thus they could save some code space
> by defining CONFIG_RELOC_FIXUP_WORKS. Similarly the gd->reloc_off field
> was no longer needed for them.
Maybe CONFIG_RELOC_NOT_IMPLEMENTED would be more descriptive
>
> I'm not familiar with if or how those architectures are relocating, just
> that they didn't need relocation fixups. So that was the logic...
>
>> So ultimately, what we are looking at is the complete and utter
>> removal of any code which references a relocation adjustment in lieu
>> of each arch either:
>>
>> a) Execute in Place from Flash, or;
>> b) Setting a fixed TEXT_BASE at a known RAM location and copying
>> the contents of Flash to RAM, or;
>> c) Implementing full Relocation
>
> d) Leaving those architectures the way they are now
> Could be added if a,b,c won't work for some reason too.
Which is essentially either a) or b) depending on which way the arch
was implemented. For x86, it has been b) but it is going towards c)
>
> I think it would be great to remove any manual relocation adjustments in
> the long run. This isn't strictly necessary though, as we can still
> have manual relocations littering the code - its just a bit dirty and
> prone to issues in the long run.
>
> So my vote would be to shoot for c) for all arches, but I have no idea
> what impact that would have on them:)
So the big question now is - How many arches do partial relocation
and really need gd->reloc_off
>
> Best,
> Peter
>
>
Regards,
Graeme
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-08 23:12 ` Joakim Tjernlund
2009-10-09 0:09 ` J. William Campbell
@ 2009-10-10 4:43 ` Graeme Russ
2009-10-10 8:07 ` Joakim Tjernlund
1 sibling, 1 reply; 47+ messages in thread
From: Graeme Russ @ 2009-10-10 4:43 UTC (permalink / raw)
To: u-boot
On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
>>
>> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
>> <jwilliamcampbell@comcast.net> wrote:
>> > Graeme Russ wrote:
>> >>
>> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
>> >> <jwilliamcampbell@comcast.net> wrote:
>> >>
>> >>>
>> >>> Graeme Russ wrote:
>> >>>
>> >>>>
>> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>> >>>> the results (fixed width font will help - its space, not tab,
>> >>>> formatted):
>> >>>>
>> >>>> Section non-reloc reloc
>> >>>> ---------------------------------------
>> >>>> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
>> >>>> .rodata 00005bad 000059d0
>> >>>> .interp n/a 00000013
>> >>>> .dynstr n/a 00000648
>> >>>> .hash n/a 00000428
>> >>>> .eh_frame 00003268 000034fc
>> >>>> .data 00000a6c 000001dc
>> >>>> .data.rel n/a 00000098
>> >>>> .data.rel.ro.local n/a 00000178
>> >>>> .data.rel.local n/a 000007e4
>> >>>> .got 00000000 000001f0
>> >>>> .got.plt n/a 0000000c
>> >>>> .rel.got n/a 000003e0
>> >>>> .rel.dyn n/a 00001228
>> >>>> .dynsym n/a 00000850
>> >>>> .dynamic n/a 00000080
>> >>>> .u_boot_cmd 000003c0 000003c0
>> >>>> .bss 00001a34 00001a34
>> >>>> .realmode 00000166 00000166
>> >>>> .bios 0000053e 0000053e
>> >>>> =======================================
>> >>>> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
>> >>>>
>> >>>> Its more than a 16% increase in size!!!
>> >>>>
>> >>>> .text accounts for a little under half of the total bloat, and of that,
>> >>>> the crude dynamic loader accounts for only 341 bytes
>> >>>>
>> >>>>
>> >>>
>> >>> Hi Graeme,
>> >>> I would be interested in a third option (column), the x86 build with
>> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
>> >>> there
>> >>> will be extra code that references the GOT and missing code to do some of
>> >>> the relocation, but it would still be interesting.
>> >>>
>> >>
>> >> x86 does not have -mrelocatable. This is a PPC only option :(
>> >>
>> >
>> > Hi Graeme,
>> > You are unfortunately correct. However, I wonder if we can get
>> > essentially the same result by executing the final ld step with the
>> > --emit-relocs switch included. This may also include some "extra" sections
>> > that we would want to strip out, but if it works, it could give all
>> > ELF-based systems a way to a relocatable u-boot.
>> >
>>
>> I don't think --emit-relocs is necessary with -pic. I haven't gone through
>> all the permutations to see if there is a smaller option, but gcc -fpic and
>> ld -pie creates enough information to perform relocation on the x86
>> platform
>
> Try -fvisibility=hidden
Thanks - Shaved another 2539 bytes off the binary
Also found out how to get rid of .eh_frame (crept in when I upgraded to
gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
Total saving of 15.6k
>
> Jocke
>
>
Regards,
Graeme
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-10 4:43 ` Graeme Russ
@ 2009-10-10 8:07 ` Joakim Tjernlund
2009-10-10 8:46 ` Graeme Russ
0 siblings, 1 reply; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-10 8:07 UTC (permalink / raw)
To: u-boot
Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
>
> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
> <joakim.tjernlund@transmode.se> wrote:
> >>
> >> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
> >> <jwilliamcampbell@comcast.net> wrote:
> >> > Graeme Russ wrote:
> >> >>
> >> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
> >> >> <jwilliamcampbell@comcast.net> wrote:
> >> >>
> >> >>>
> >> >>> Graeme Russ wrote:
> >> >>>
> >> >>>>
> >> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
> >> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
> >> >>>> the results (fixed width font will help - its space, not tab,
> >> >>>> formatted):
> >> >>>>
> >> >>>> Section non-reloc reloc
> >> >>>> ---------------------------------------
> >> >>>> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
> >> >>>> .rodata 00005bad 000059d0
> >> >>>> .interp n/a 00000013
> >> >>>> .dynstr n/a 00000648
> >> >>>> .hash n/a 00000428
> >> >>>> .eh_frame 00003268 000034fc
> >> >>>> .data 00000a6c 000001dc
> >> >>>> .data.rel n/a 00000098
> >> >>>> .data.rel.ro.local n/a 00000178
> >> >>>> .data.rel.local n/a 000007e4
> >> >>>> .got 00000000 000001f0
> >> >>>> .got.plt n/a 0000000c
> >> >>>> .rel.got n/a 000003e0
> >> >>>> .rel.dyn n/a 00001228
> >> >>>> .dynsym n/a 00000850
> >> >>>> .dynamic n/a 00000080
> >> >>>> .u_boot_cmd 000003c0 000003c0
> >> >>>> .bss 00001a34 00001a34
> >> >>>> .realmode 00000166 00000166
> >> >>>> .bios 0000053e 0000053e
> >> >>>> =======================================
> >> >>>> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
> >> >>>>
> >> >>>> Its more than a 16% increase in size!!!
> >> >>>>
> >> >>>> .text accounts for a little under half of the total bloat, and of that,
> >> >>>> the crude dynamic loader accounts for only 341 bytes
> >> >>>>
> >> >>>>
> >> >>>
> >> >>> Hi Graeme,
> >> >>> I would be interested in a third option (column), the x86 build with
> >> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
> >> >>> there
> >> >>> will be extra code that references the GOT and missing code to do some of
> >> >>> the relocation, but it would still be interesting.
> >> >>>
> >> >>
> >> >> x86 does not have -mrelocatable. This is a PPC only option :(
> >> >>
> >> >
> >> > Hi Graeme,
> >> > You are unfortunately correct. However, I wonder if we can get
> >> > essentially the same result by executing the final ld step with the
> >> > --emit-relocs switch included. This may also include some "extra" sections
> >> > that we would want to strip out, but if it works, it could give all
> >> > ELF-based systems a way to a relocatable u-boot.
> >> >
> >>
> >> I don't think --emit-relocs is necessary with -pic. I haven't gone through
> >> all the permutations to see if there is a smaller option, but gcc -fpic and
> >> ld -pie creates enough information to perform relocation on the x86
> >> platform
> >
> > Try -fvisibility=hidden
>
> Thanks - Shaved another 2539 bytes off the binary
>
> Also found out how to get rid of .eh_frame (crept in when I upgraded to
> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
>
> Total saving of 15.6k
Great, so now you are back at just a few percent added I guess?
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-10 8:07 ` Joakim Tjernlund
@ 2009-10-10 8:46 ` Graeme Russ
2009-10-10 9:27 ` Joakim Tjernlund
0 siblings, 1 reply; 47+ messages in thread
From: Graeme Russ @ 2009-10-10 8:46 UTC (permalink / raw)
To: u-boot
On Sat, Oct 10, 2009 at 7:07 PM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
> Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
>>
>> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>> >>
>> >> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
>> >> <jwilliamcampbell@comcast.net> wrote:
>> >> > Graeme Russ wrote:
>> >> >>
>> >> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
>> >> >> <jwilliamcampbell@comcast.net> wrote:
>> >> >>
>> >> >>>
>> >> >>> Graeme Russ wrote:
>> >> >>>
>> >> >>>>
>> >> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>> >> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>> >> >>>> the results (fixed width font will help - its space, not tab,
>> >> >>>> formatted):
>> >> >>>>
>> >> >>>> Section non-reloc reloc
>> >> >>>> ---------------------------------------
>> >> >>>> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
>> >> >>>> .rodata 00005bad 000059d0
>> >> >>>> .interp n/a 00000013
>> >> >>>> .dynstr n/a 00000648
>> >> >>>> .hash n/a 00000428
>> >> >>>> .eh_frame 00003268 000034fc
>> >> >>>> .data 00000a6c 000001dc
>> >> >>>> .data.rel n/a 00000098
>> >> >>>> .data.rel.ro.local n/a 00000178
>> >> >>>> .data.rel.local n/a 000007e4
>> >> >>>> .got 00000000 000001f0
>> >> >>>> .got.plt n/a 0000000c
>> >> >>>> .rel.got n/a 000003e0
>> >> >>>> .rel.dyn n/a 00001228
>> >> >>>> .dynsym n/a 00000850
>> >> >>>> .dynamic n/a 00000080
>> >> >>>> .u_boot_cmd 000003c0 000003c0
>> >> >>>> .bss 00001a34 00001a34
>> >> >>>> .realmode 00000166 00000166
>> >> >>>> .bios 0000053e 0000053e
>> >> >>>> =======================================
>> >> >>>> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
>> >> >>>>
>> >> >>>> Its more than a 16% increase in size!!!
>> >> >>>>
>> >> >>>> .text accounts for a little under half of the total bloat, and of that,
>> >> >>>> the crude dynamic loader accounts for only 341 bytes
>> >> >>>>
>> >> >>>>
>> >> >>>
>> >> >>> Hi Graeme,
>> >> >>> I would be interested in a third option (column), the x86 build with
>> >> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
>> >> >>> there
>> >> >>> will be extra code that references the GOT and missing code to do some of
>> >> >>> the relocation, but it would still be interesting.
>> >> >>>
>> >> >>
>> >> >> x86 does not have -mrelocatable. This is a PPC only option :(
>> >> >>
>> >> >
>> >> > Hi Graeme,
>> >> > You are unfortunately correct. However, I wonder if we can get
>> >> > essentially the same result by executing the final ld step with the
>> >> > --emit-relocs switch included. This may also include some "extra" sections
>> >> > that we would want to strip out, but if it works, it could give all
>> >> > ELF-based systems a way to a relocatable u-boot.
>> >> >
>> >>
>> >> I don't think --emit-relocs is necessary with -pic. I haven't gone through
>> >> all the permutations to see if there is a smaller option, but gcc -fpic and
>> >> ld -pie creates enough information to perform relocation on the x86
>> >> platform
>> >
>> > Try -fvisibility=hidden
>>
>> Thanks - Shaved another 2539 bytes off the binary
>>
>> Also found out how to get rid of .eh_frame (crept in when I upgraded to
>> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
>>
>> Total saving of 15.6k
>
> Great, so now you are back at just a few percent added I guess?
>
>
Not really - The .eh_frame saving applies to both relocated and non
relocated builds
Regards,
Graeme
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-10 8:46 ` Graeme Russ
@ 2009-10-10 9:27 ` Joakim Tjernlund
2009-10-10 10:38 ` Graeme Russ
0 siblings, 1 reply; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-10 9:27 UTC (permalink / raw)
To: u-boot
Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 10:46:52:
>
> On Sat, Oct 10, 2009 at 7:07 PM, Joakim Tjernlund
> <joakim.tjernlund@transmode.se> wrote:
> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
> >>
> >> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
> >> <joakim.tjernlund@transmode.se> wrote:
> >> >>
> >> >> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
> >> >> <jwilliamcampbell@comcast.net> wrote:
> >> >> > Graeme Russ wrote:
> >> >> >>
> >> >> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
> >> >> >> <jwilliamcampbell@comcast.net> wrote:
> >> >> >>
> >> >> >>>
> >> >> >>> Graeme Russ wrote:
> >> >> >>>
> >> >> >>>>
> >> >> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
> >> >> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
> >> >> >>>> the results (fixed width font will help - its space, not tab,
> >> >> >>>> formatted):
> >> >> >>>>
> >> >> >>>> Section non-reloc reloc
> >> >> >>>> ---------------------------------------
> >> >> >>>> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
> >> >> >>>> .rodata 00005bad 000059d0
> >> >> >>>> .interp n/a 00000013
> >> >> >>>> .dynstr n/a 00000648
> >> >> >>>> .hash n/a 00000428
> >> >> >>>> .eh_frame 00003268 000034fc
> >> >> >>>> .data 00000a6c 000001dc
> >> >> >>>> .data.rel n/a 00000098
> >> >> >>>> .data.rel.ro.local n/a 00000178
> >> >> >>>> .data.rel.local n/a 000007e4
> >> >> >>>> .got 00000000 000001f0
> >> >> >>>> .got.plt n/a 0000000c
> >> >> >>>> .rel.got n/a 000003e0
> >> >> >>>> .rel.dyn n/a 00001228
> >> >> >>>> .dynsym n/a 00000850
> >> >> >>>> .dynamic n/a 00000080
> >> >> >>>> .u_boot_cmd 000003c0 000003c0
> >> >> >>>> .bss 00001a34 00001a34
> >> >> >>>> .realmode 00000166 00000166
> >> >> >>>> .bios 0000053e 0000053e
> >> >> >>>> =======================================
> >> >> >>>> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
> >> >> >>>>
> >> >> >>>> Its more than a 16% increase in size!!!
> >> >> >>>>
> >> >> >>>> .text accounts for a little under half of the total bloat, and of that,
> >> >> >>>> the crude dynamic loader accounts for only 341 bytes
> >> >> >>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>> Hi Graeme,
> >> >> >>> I would be interested in a third option (column), the x86 build with
> >> >> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
> >> >> >>> there
> >> >> >>> will be extra code that references the GOT and missing code to do some of
> >> >> >>> the relocation, but it would still be interesting.
> >> >> >>>
> >> >> >>
> >> >> >> x86 does not have -mrelocatable. This is a PPC only option :(
> >> >> >>
> >> >> >
> >> >> > Hi Graeme,
> >> >> > You are unfortunately correct. However, I wonder if we can get
> >> >> > essentially the same result by executing the final ld step with the
> >> >> > --emit-relocs switch included. This may also include some "extra" sections
> >> >> > that we would want to strip out, but if it works, it could give all
> >> >> > ELF-based systems a way to a relocatable u-boot.
> >> >> >
> >> >>
> >> >> I don't think --emit-relocs is necessary with -pic. I haven't gone through
> >> >> all the permutations to see if there is a smaller option, but gcc -fpic and
> >> >> ld -pie creates enough information to perform relocation on the x86
> >> >> platform
> >> >
> >> > Try -fvisibility=hidden
> >>
> >> Thanks - Shaved another 2539 bytes off the binary
> >>
> >> Also found out how to get rid of .eh_frame (crept in when I upgraded to
> >> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
> >>
> >> Total saving of 15.6k
> >
> > Great, so now you are back at just a few percent added I guess?
> >
> >
>
> Not really - The .eh_frame saving applies to both relocated and non
> relocated builds
OK, so you didn't use PIC before at all?
Anyway I think you can do more. Using -Bsymbolic you should get
away with RELATIVE relocs only and be able to skip a lot of segments above.
Have a look at uClibc ldso/ldso/dl-startup.c
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-10 9:27 ` Joakim Tjernlund
@ 2009-10-10 10:38 ` Graeme Russ
2009-10-10 10:47 ` Joakim Tjernlund
0 siblings, 1 reply; 47+ messages in thread
From: Graeme Russ @ 2009-10-10 10:38 UTC (permalink / raw)
To: u-boot
On Sat, Oct 10, 2009 at 8:27 PM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
>
>
> Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 10:46:52:
>>
>> On Sat, Oct 10, 2009 at 7:07 PM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
>> >>
>> >> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
>> >> <joakim.tjernlund@transmode.se> wrote:
>> >> >>
>> >> >> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
>> >> >> <jwilliamcampbell@comcast.net> wrote:
>> >> >> > Graeme Russ wrote:
>> >> >> >>
>> >> >> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
>> >> >> >> <jwilliamcampbell@comcast.net> wrote:
>> >> >> >>
>> >> >> >>>
>> >> >> >>> Graeme Russ wrote:
>> >> >> >>>
>> >> >> >>>>
>> >> >> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>> >> >> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>> >> >> >>>> the results (fixed width font will help - its space, not tab,
>> >> >> >>>> formatted):
>> >> >> >>>>
>> >> >> >>>> Section non-reloc reloc
>> >> >> >>>> ---------------------------------------
>> >> >> >>>> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
>> >> >> >>>> .rodata 00005bad 000059d0
>> >> >> >>>> .interp n/a 00000013
>> >> >> >>>> .dynstr n/a 00000648
>> >> >> >>>> .hash n/a 00000428
>> >> >> >>>> .eh_frame 00003268 000034fc
>> >> >> >>>> .data 00000a6c 000001dc
>> >> >> >>>> .data.rel n/a 00000098
>> >> >> >>>> .data.rel.ro.local n/a 00000178
>> >> >> >>>> .data.rel.local n/a 000007e4
>> >> >> >>>> .got 00000000 000001f0
>> >> >> >>>> .got.plt n/a 0000000c
>> >> >> >>>> .rel.got n/a 000003e0
>> >> >> >>>> .rel.dyn n/a 00001228
>> >> >> >>>> .dynsym n/a 00000850
>> >> >> >>>> .dynamic n/a 00000080
>> >> >> >>>> .u_boot_cmd 000003c0 000003c0
>> >> >> >>>> .bss 00001a34 00001a34
>> >> >> >>>> .realmode 00000166 00000166
>> >> >> >>>> .bios 0000053e 0000053e
>> >> >> >>>> =======================================
>> >> >> >>>> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
>> >> >> >>>>
>> >> >> >>>> Its more than a 16% increase in size!!!
>> >> >> >>>>
>> >> >> >>>> .text accounts for a little under half of the total bloat, and of that,
>> >> >> >>>> the crude dynamic loader accounts for only 341 bytes
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>
>> >> >> >>> Hi Graeme,
>> >> >> >>> I would be interested in a third option (column), the x86 build with
>> >> >> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
>> >> >> >>> there
>> >> >> >>> will be extra code that references the GOT and missing code to do some of
>> >> >> >>> the relocation, but it would still be interesting.
>> >> >> >>>
>> >> >> >>
>> >> >> >> x86 does not have -mrelocatable. This is a PPC only option :(
>> >> >> >>
>> >> >> >
>> >> >> > Hi Graeme,
>> >> >> > You are unfortunately correct. However, I wonder if we can get
>> >> >> > essentially the same result by executing the final ld step with the
>> >> >> > --emit-relocs switch included. This may also include some "extra" sections
>> >> >> > that we would want to strip out, but if it works, it could give all
>> >> >> > ELF-based systems a way to a relocatable u-boot.
>> >> >> >
>> >> >>
>> >> >> I don't think --emit-relocs is necessary with -pic. I haven't gone through
>> >> >> all the permutations to see if there is a smaller option, but gcc -fpic and
>> >> >> ld -pie creates enough information to perform relocation on the x86
>> >> >> platform
>> >> >
>> >> > Try -fvisibility=hidden
>> >>
>> >> Thanks - Shaved another 2539 bytes off the binary
>> >>
>> >> Also found out how to get rid of .eh_frame (crept in when I upgraded to
>> >> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
>> >>
>> >> Total saving of 15.6k
>> >
>> > Great, so now you are back at just a few percent added I guess?
>> >
>> >
>>
>> Not really - The .eh_frame saving applies to both relocated and non
>> relocated builds
>
> OK, so you didn't use PIC before at all?
>
> Anyway I think you can do more. Using -Bsymbolic you should get
> away with RELATIVE relocs only and be able to skip a lot of segments above.
> Have a look at uClibc ldso/ldso/dl-startup.c
>
>
My build options thus far are:
PLATFORM_RELFLAGS += -fpie -fvisibility=hidden
PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
PLATFORM_LDFLAGS += -pie
-fpic / -pic make no difference
Interestingly, -Bsymbolic adds exactly 8 bytes to .dynamic, but doesn't
change the size of any other section
Pulling apart the relocation sections, it seems that all relocations are
already RELATIVE even without -Bsymbolic
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-10 10:38 ` Graeme Russ
@ 2009-10-10 10:47 ` Joakim Tjernlund
2009-10-10 11:21 ` Graeme Russ
2009-10-10 16:52 ` Mike Frysinger
0 siblings, 2 replies; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-10 10:47 UTC (permalink / raw)
To: u-boot
Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 12:38:19:
>
> On Sat, Oct 10, 2009 at 8:27 PM, Joakim Tjernlund
> <joakim.tjernlund@transmode.se> wrote:
> >
> >
> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 10:46:52:
> >>
> >> On Sat, Oct 10, 2009 at 7:07 PM, Joakim Tjernlund
> >> <joakim.tjernlund@transmode.se> wrote:
> >> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
> >> >>
> >> >> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
> >> >> <joakim.tjernlund@transmode.se> wrote:
> >> >> >>
> >> >> >> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
> >> >> >> <jwilliamcampbell@comcast.net> wrote:
> >> >> >> > Graeme Russ wrote:
> >> >> >> >>
> >> >> >> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
> >> >> >> >> <jwilliamcampbell@comcast.net> wrote:
> >> >> >> >>
> >> >> >> >>>
> >> >> >> >>> Graeme Russ wrote:
> >> >> >> >>>
> >> >> >> >>>>
> >> >> >> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
> >> >> >> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
> >> >> >> >>>> the results (fixed width font will help - its space, not tab,
> >> >> >> >>>> formatted):
> >> >> >> >>>>
> >> >> >> >>>> Section non-reloc reloc
> >> >> >> >>>> ---------------------------------------
> >> >> >> >>>> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
> >> >> >> >>>> .rodata 00005bad 000059d0
> >> >> >> >>>> .interp n/a 00000013
> >> >> >> >>>> .dynstr n/a 00000648
> >> >> >> >>>> .hash n/a 00000428
> >> >> >> >>>> .eh_frame 00003268 000034fc
> >> >> >> >>>> .data 00000a6c 000001dc
> >> >> >> >>>> .data.rel n/a 00000098
> >> >> >> >>>> .data.rel.ro.local n/a 00000178
> >> >> >> >>>> .data.rel.local n/a 000007e4
> >> >> >> >>>> .got 00000000 000001f0
> >> >> >> >>>> .got.plt n/a 0000000c
> >> >> >> >>>> .rel.got n/a 000003e0
> >> >> >> >>>> .rel.dyn n/a 00001228
> >> >> >> >>>> .dynsym n/a 00000850
> >> >> >> >>>> .dynamic n/a 00000080
> >> >> >> >>>> .u_boot_cmd 000003c0 000003c0
> >> >> >> >>>> .bss 00001a34 00001a34
> >> >> >> >>>> .realmode 00000166 00000166
> >> >> >> >>>> .bios 0000053e 0000053e
> >> >> >> >>>> =======================================
> >> >> >> >>>> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
> >> >> >> >>>>
> >> >> >> >>>> Its more than a 16% increase in size!!!
> >> >> >> >>>>
> >> >> >> >>>> .text accounts for a little under half of the total bloat, and of that,
> >> >> >> >>>> the crude dynamic loader accounts for only 341 bytes
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>
> >> >> >> >>> Hi Graeme,
> >> >> >> >>> I would be interested in a third option (column), the x86 build with
> >> >> >> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
> >> >> >> >>> there
> >> >> >> >>> will be extra code that references the GOT and missing code to do some of
> >> >> >> >>> the relocation, but it would still be interesting.
> >> >> >> >>>
> >> >> >> >>
> >> >> >> >> x86 does not have -mrelocatable. This is a PPC only option :(
> >> >> >> >>
> >> >> >> >
> >> >> >> > Hi Graeme,
> >> >> >> > You are unfortunately correct. However, I wonder if we can get
> >> >> >> > essentially the same result by executing the final ld step with the
> >> >> >> > --emit-relocs switch included. This may also include some "extra" sections
> >> >> >> > that we would want to strip out, but if it works, it could give all
> >> >> >> > ELF-based systems a way to a relocatable u-boot.
> >> >> >> >
> >> >> >>
> >> >> >> I don't think --emit-relocs is necessary with -pic. I haven't gone through
> >> >> >> all the permutations to see if there is a smaller option, but gcc -fpic and
> >> >> >> ld -pie creates enough information to perform relocation on the x86
> >> >> >> platform
> >> >> >
> >> >> > Try -fvisibility=hidden
> >> >>
> >> >> Thanks - Shaved another 2539 bytes off the binary
> >> >>
> >> >> Also found out how to get rid of .eh_frame (crept in when I upgraded to
> >> >> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
> >> >>
> >> >> Total saving of 15.6k
> >> >
> >> > Great, so now you are back at just a few percent added I guess?
> >> >
> >> >
> >>
> >> Not really - The .eh_frame saving applies to both relocated and non
> >> relocated builds
> >
> > OK, so you didn't use PIC before at all?
> >
> > Anyway I think you can do more. Using -Bsymbolic you should get
> > away with RELATIVE relocs only and be able to skip a lot of segments above.
> > Have a look at uClibc ldso/ldso/dl-startup.c
> >
> >
>
> My build options thus far are:
>
> PLATFORM_RELFLAGS += -fpie -fvisibility=hidden
> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
> PLATFORM_LDFLAGS += -pie
>
> -fpic / -pic make no difference
not on x86, on ppc it is a big difference.
>
> Interestingly, -Bsymbolic adds exactly 8 bytes to .dynamic, but doesn't
> change the size of any other section
>
> Pulling apart the relocation sections, it seems that all relocations are
> already RELATIVE even without -Bsymbolic
Ah, that is because you built an exe with -pie
Then you should be able to drop everything but the RELATIVE
from the linking, or almost in any case.
Jocke
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-10 10:47 ` Joakim Tjernlund
@ 2009-10-10 11:21 ` Graeme Russ
2009-10-10 15:38 ` Joakim Tjernlund
[not found] ` <4AD0B3D7.7020900@comcast.net>
2009-10-10 16:52 ` Mike Frysinger
1 sibling, 2 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-10 11:21 UTC (permalink / raw)
To: u-boot
On Sat, Oct 10, 2009 at 9:47 PM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
>
>
> Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 12:38:19:
>>
>> On Sat, Oct 10, 2009 at 8:27 PM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>> >
>> >
>> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 10:46:52:
>> >>
>> >> On Sat, Oct 10, 2009 at 7:07 PM, Joakim Tjernlund
>> >> <joakim.tjernlund@transmode.se> wrote:
>> >> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
>> >> >>
>> >> >> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
>> >> >> <joakim.tjernlund@transmode.se> wrote:
>> >> >> >>
>> >> >> >> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
>> >> >> >> <jwilliamcampbell@comcast.net> wrote:
>> >> >> >> > Graeme Russ wrote:
>> >> >> >> >>
>> >> >> >> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
>> >> >> >> >> <jwilliamcampbell@comcast.net> wrote:
>> >> >> >> >>
>> >> >> >> >>>
>> >> >> >> >>> Graeme Russ wrote:
>> >> >> >> >>>
>> >> >> >> >>>>
>> >> >> >> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>> >> >> >> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>> >> >> >> >>>> the results (fixed width font will help - its space, not tab,
>> >> >> >> >>>> formatted):
>> >> >> >> >>>>
>> >> >> >> >>>> Section non-reloc reloc
>> >> >> >> >>>> ---------------------------------------
>> >> >> >> >>>> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
>> >> >> >> >>>> .rodata 00005bad 000059d0
>> >> >> >> >>>> .interp n/a 00000013
>> >> >> >> >>>> .dynstr n/a 00000648
>> >> >> >> >>>> .hash n/a 00000428
>> >> >> >> >>>> .eh_frame 00003268 000034fc
>> >> >> >> >>>> .data 00000a6c 000001dc
>> >> >> >> >>>> .data.rel n/a 00000098
>> >> >> >> >>>> .data.rel.ro.local n/a 00000178
>> >> >> >> >>>> .data.rel.local n/a 000007e4
>> >> >> >> >>>> .got 00000000 000001f0
>> >> >> >> >>>> .got.plt n/a 0000000c
>> >> >> >> >>>> .rel.got n/a 000003e0
>> >> >> >> >>>> .rel.dyn n/a 00001228
>> >> >> >> >>>> .dynsym n/a 00000850
>> >> >> >> >>>> .dynamic n/a 00000080
>> >> >> >> >>>> .u_boot_cmd 000003c0 000003c0
>> >> >> >> >>>> .bss 00001a34 00001a34
>> >> >> >> >>>> .realmode 00000166 00000166
>> >> >> >> >>>> .bios 0000053e 0000053e
>> >> >> >> >>>> =======================================
>> >> >> >> >>>> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
>> >> >> >> >>>>
>> >> >> >> >>>> Its more than a 16% increase in size!!!
>> >> >> >> >>>>
>> >> >> >> >>>> .text accounts for a little under half of the total bloat, and of that,
>> >> >> >> >>>> the crude dynamic loader accounts for only 341 bytes
>> >> >> >> >>>>
>> >> >> >> >>>>
>> >> >> >> >>>
>> >> >> >> >>> Hi Graeme,
>> >> >> >> >>> I would be interested in a third option (column), the x86 build with
>> >> >> >> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
>> >> >> >> >>> there
>> >> >> >> >>> will be extra code that references the GOT and missing code to do some of
>> >> >> >> >>> the relocation, but it would still be interesting.
>> >> >> >> >>>
>> >> >> >> >>
>> >> >> >> >> x86 does not have -mrelocatable. This is a PPC only option :(
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> > Hi Graeme,
>> >> >> >> > You are unfortunately correct. However, I wonder if we can get
>> >> >> >> > essentially the same result by executing the final ld step with the
>> >> >> >> > --emit-relocs switch included. This may also include some "extra" sections
>> >> >> >> > that we would want to strip out, but if it works, it could give all
>> >> >> >> > ELF-based systems a way to a relocatable u-boot.
>> >> >> >> >
>> >> >> >>
>> >> >> >> I don't think --emit-relocs is necessary with -pic. I haven't gone through
>> >> >> >> all the permutations to see if there is a smaller option, but gcc -fpic and
>> >> >> >> ld -pie creates enough information to perform relocation on the x86
>> >> >> >> platform
>> >> >> >
>> >> >> > Try -fvisibility=hidden
>> >> >>
>> >> >> Thanks - Shaved another 2539 bytes off the binary
>> >> >>
>> >> >> Also found out how to get rid of .eh_frame (crept in when I upgraded to
>> >> >> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
>> >> >>
>> >> >> Total saving of 15.6k
>> >> >
>> >> > Great, so now you are back at just a few percent added I guess?
>> >> >
>> >> >
>> >>
>> >> Not really - The .eh_frame saving applies to both relocated and non
>> >> relocated builds
>> >
>> > OK, so you didn't use PIC before at all?
>> >
>> > Anyway I think you can do more. Using -Bsymbolic you should get
>> > away with RELATIVE relocs only and be able to skip a lot of segments above.
>> > Have a look at uClibc ldso/ldso/dl-startup.c
>> >
>> >
>>
>> My build options thus far are:
>>
>> PLATFORM_RELFLAGS += -fpie -fvisibility=hidden
>> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>> PLATFORM_LDFLAGS += -pie
>>
>> -fpic / -pic make no difference
>
> not on x86, on ppc it is a big difference.
>
>>
>> Interestingly, -Bsymbolic adds exactly 8 bytes to .dynamic, but doesn't
>> change the size of any other section
>>
>> Pulling apart the relocation sections, it seems that all relocations are
>> already RELATIVE even without -Bsymbolic
>
> Ah, that is because you built an exe with -pie
> Then you should be able to drop everything but the RELATIVE
> from the linking, or almost in any case.
>
> Jocke
>
>
Hmm, so its seems I may have hit the limit. I tried:
PLATFORM_LDFLAGS += -r --emit-relocs
but there is not enough information left to complete the relocation. It
seems as though I need .rel.got, .got.plt, .dynsym and .rel.dyn in order
to find the actual bytes that need modifying (it also seems to mess with
the size of the stripped binary for some reason)
Looks like I'll have to proceed with my original plan - a bit bloated,
but it works
Graeme
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-10 11:21 ` Graeme Russ
@ 2009-10-10 15:38 ` Joakim Tjernlund
2009-10-11 10:47 ` Graeme Russ
[not found] ` <4AD0B3D7.7020900@comcast.net>
1 sibling, 1 reply; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-10 15:38 UTC (permalink / raw)
To: u-boot
Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 13:21:10:
>
> On Sat, Oct 10, 2009 at 9:47 PM, Joakim Tjernlund
> <joakim.tjernlund@transmode.se> wrote:
> >
> >
> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 12:38:19:
> >>
> >> On Sat, Oct 10, 2009 at 8:27 PM, Joakim Tjernlund
> >> <joakim.tjernlund@transmode.se> wrote:
> >> >
> >> >
> >> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 10:46:52:
> >> >>
> >> >> On Sat, Oct 10, 2009 at 7:07 PM, Joakim Tjernlund
> >> >> <joakim.tjernlund@transmode.se> wrote:
> >> >> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
> >> >> >>
> >> >> >> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
> >> >> >> <joakim.tjernlund@transmode.se> wrote:
> >> >> >> >>
> >> >> >> >> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
> >> >> >> >> <jwilliamcampbell@comcast.net> wrote:
> >> >> >> >> > Graeme Russ wrote:
> >> >> >> >> >>
> >> >> >> >> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
> >> >> >> >> >> <jwilliamcampbell@comcast.net> wrote:
> >> >> >> >> >>
> >> >> >> >> >>>
> >> >> >> >> >>> Graeme Russ wrote:
> >> >> >> >> >>>
> >> >> >> >> >>>>
> >> >> >> >> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
> >> >> >> >> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
> >> >> >> >> >>>> the results (fixed width font will help - its space, not tab,
> >> >> >> >> >>>> formatted):
> >> >> >> >> >>>>
> >> >> >> >> >>>> Section non-reloc reloc
> >> >> >> >> >>>> ---------------------------------------
> >> >> >> >> >>>> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
> >> >> >> >> >>>> .rodata 00005bad 000059d0
> >> >> >> >> >>>> .interp n/a 00000013
> >> >> >> >> >>>> .dynstr n/a 00000648
> >> >> >> >> >>>> .hash n/a 00000428
> >> >> >> >> >>>> .eh_frame 00003268 000034fc
> >> >> >> >> >>>> .data 00000a6c 000001dc
> >> >> >> >> >>>> .data.rel n/a 00000098
> >> >> >> >> >>>> .data.rel.ro.local n/a 00000178
> >> >> >> >> >>>> .data.rel.local n/a 000007e4
> >> >> >> >> >>>> .got 00000000 000001f0
> >> >> >> >> >>>> .got.plt n/a 0000000c
> >> >> >> >> >>>> .rel.got n/a 000003e0
> >> >> >> >> >>>> .rel.dyn n/a 00001228
> >> >> >> >> >>>> .dynsym n/a 00000850
> >> >> >> >> >>>> .dynamic n/a 00000080
> >> >> >> >> >>>> .u_boot_cmd 000003c0 000003c0
> >> >> >> >> >>>> .bss 00001a34 00001a34
> >> >> >> >> >>>> .realmode 00000166 00000166
> >> >> >> >> >>>> .bios 0000053e 0000053e
> >> >> >> >> >>>> =======================================
> >> >> >> >> >>>> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
> >> >> >> >> >>>>
> >> >> >> >> >>>> Its more than a 16% increase in size!!!
> >> >> >> >> >>>>
> >> >> >> >> >>>> .text accounts for a little under half of the total bloat, and of that,
> >> >> >> >> >>>> the crude dynamic loader accounts for only 341 bytes
> >> >> >> >> >>>>
> >> >> >> >> >>>>
> >> >> >> >> >>>
> >> >> >> >> >>> Hi Graeme,
> >> >> >> >> >>> I would be interested in a third option (column), the x86 build with
> >> >> >> >> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
> >> >> >> >> >>> there
> >> >> >> >> >>> will be extra code that references the GOT and missing code todo some of
> >> >> >> >> >>> the relocation, but it would still be interesting.
> >> >> >> >> >>>
> >> >> >> >> >>
> >> >> >> >> >> x86 does not have -mrelocatable. This is a PPC only option :(
> >> >> >> >> >>
> >> >> >> >> >
> >> >> >> >> > Hi Graeme,
> >> >> >> >> > You are unfortunately correct. However, I wonder if we can get
> >> >> >> >> > essentially the same result by executing the final ld step with the
> >> >> >> >> > --emit-relocs switch included. This may also include some "extra" sections
> >> >> >> >> > that we would want to strip out, but if it works, it could give all
> >> >> >> >> > ELF-based systems a way to a relocatable u-boot.
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >> I don't think --emit-relocs is necessary with -pic. I haven't gone through
> >> >> >> >> all the permutations to see if there is a smaller option, but gcc -fpic and
> >> >> >> >> ld -pie creates enough information to perform relocation on the x86
> >> >> >> >> platform
> >> >> >> >
> >> >> >> > Try -fvisibility=hidden
> >> >> >>
> >> >> >> Thanks - Shaved another 2539 bytes off the binary
> >> >> >>
> >> >> >> Also found out how to get rid of .eh_frame (crept in when I upgraded to
> >> >> >> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
> >> >> >>
> >> >> >> Total saving of 15.6k
> >> >> >
> >> >> > Great, so now you are back at just a few percent added I guess?
> >> >> >
> >> >> >
> >> >>
> >> >> Not really - The .eh_frame saving applies to both relocated and non
> >> >> relocated builds
> >> >
> >> > OK, so you didn't use PIC before at all?
> >> >
> >> > Anyway I think you can do more. Using -Bsymbolic you should get
> >> > away with RELATIVE relocs only and be able to skip a lot of segments above.
> >> > Have a look at uClibc ldso/ldso/dl-startup.c
> >> >
> >> >
> >>
> >> My build options thus far are:
> >>
> >> PLATFORM_RELFLAGS += -fpie -fvisibility=hidden
> >> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
> >> PLATFORM_LDFLAGS += -pie
> >>
> >> -fpic / -pic make no difference
> >
> > not on x86, on ppc it is a big difference.
> >
> >>
> >> Interestingly, -Bsymbolic adds exactly 8 bytes to .dynamic, but doesn't
> >> change the size of any other section
> >>
> >> Pulling apart the relocation sections, it seems that all relocations are
> >> already RELATIVE even without -Bsymbolic
> >
> > Ah, that is because you built an exe with -pie
> > Then you should be able to drop everything but the RELATIVE
> > from the linking, or almost in any case.
> >
> > Jocke
> >
> >
>
> Hmm, so its seems I may have hit the limit. I tried:
>
> PLATFORM_LDFLAGS += -r --emit-relocs
>
> but there is not enough information left to complete the relocation. It
> seems as though I need .rel.got, .got.plt, .dynsym and .rel.dyn in order
> to find the actual bytes that need modifying (it also seems to mess with
> the size of the stripped binary for some reason)
>
> Looks like I'll have to proceed with my original plan - a bit bloated,
> but it works
Relocation costs :(
I am not sure why you need .got.plt, it should be empty,
what is in it?
Same with dynsym, what is in it?
Memory fails me, but since u-boot is a freestanding app it I think
these two might not be needed. Perhaps there are weak unresolved
syms in there?
Jocke
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-10 10:47 ` Joakim Tjernlund
2009-10-10 11:21 ` Graeme Russ
@ 2009-10-10 16:52 ` Mike Frysinger
2009-10-10 17:45 ` Joakim Tjernlund
1 sibling, 1 reply; 47+ messages in thread
From: Mike Frysinger @ 2009-10-10 16:52 UTC (permalink / raw)
To: u-boot
On Saturday 10 October 2009 06:47:42 Joakim Tjernlund wrote:
> Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 12:38:19:
> > -fpic / -pic make no difference
>
> not on x86, on ppc it is a big difference.
i think you guys mean -fpic and -fPIC because there is no -pic flag ... while
the two make a big diff on some arches like ppc, they make pretty much no
different on x86 last i looked
-mike
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
Url : http://lists.denx.de/pipermail/u-boot/attachments/20091010/a0bb42ee/attachment.pgp
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-10 16:52 ` Mike Frysinger
@ 2009-10-10 17:45 ` Joakim Tjernlund
2009-10-11 0:43 ` Graeme Russ
0 siblings, 1 reply; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-10 17:45 UTC (permalink / raw)
To: u-boot
Mike Frysinger <vapier@gentoo.org> wrote on 10/10/2009 18:52:29:
>
> On Saturday 10 October 2009 06:47:42 Joakim Tjernlund wrote:
> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 12:38:19:
> > > -fpic / -pic make no difference
> >
> > not on x86, on ppc it is a big difference.
>
> i think you guys mean -fpic and -fPIC because there is no -pic flag ... while
> the two make a big diff on some arches like ppc, they make pretty much no
> different on x86 last i looked
Yes, this was what I was thinking(-fpic vs. -fPIC). These will probably only
make a difference on RISC like arches.
Jocke
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-10 17:45 ` Joakim Tjernlund
@ 2009-10-11 0:43 ` Graeme Russ
0 siblings, 0 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-11 0:43 UTC (permalink / raw)
To: u-boot
On Sun, Oct 11, 2009 at 4:45 AM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
> Mike Frysinger <vapier@gentoo.org> wrote on 10/10/2009 18:52:29:
>>
>> On Saturday 10 October 2009 06:47:42 Joakim Tjernlund wrote:
>> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 12:38:19:
>> > > -fpic / -pic make no difference
>> >
>> > not on x86, on ppc it is a big difference.
>>
>> i think you guys mean -fpic and -fPIC because there is no -pic flag ... while
>> the two make a big diff on some arches like ppc, they make pretty much no
>> different on x86 last i looked
Sorry for the confusion - by -fpic / -pic I was referring to -fpic (gcc) /
-pic (ld) flags versus -fpie (gcc) / -pie (ld) flags.
>
> Yes, this was what I was thinking(-fpic vs. -fPIC). These will probably only
> make a difference on RISC like arches.
>
There appears to be no difference (on x86) between pic, PIC, and pie. The
big difference is when I drop ld's -pic and use ld's --emit-relocs instead
> Jocke
>
>
Regards,
Graeme
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
[not found] ` <4AD0B3D7.7020900@comcast.net>
@ 2009-10-11 1:31 ` Graeme Russ
0 siblings, 0 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-11 1:31 UTC (permalink / raw)
To: u-boot
On Sun, Oct 11, 2009 at 3:18 AM, J. William Campbell
<jwilliamcampbell@comcast.net> wrote:
> Graeme Russ wrote:
>>
>> On Sat, Oct 10, 2009 at 9:47 PM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>>
>>>
>>> Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 12:38:19:
>>>
>>>>
>>>> On Sat, Oct 10, 2009 at 8:27 PM, Joakim Tjernlund
>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>
>>>>>
>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 10:46:52:
>>>>>
>>>>>>
>>>>>> On Sat, Oct 10, 2009 at 7:07 PM, Joakim Tjernlund
>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>
>>>>>>>
>>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
>>>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
>>>>>>>>>> <jwilliamcampbell@comcast.net> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Graeme Russ wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
>>>>>>>>>>>> <jwilliamcampbell@comcast.net> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Graeme Russ wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Out of curiosity, I wanted to see just how much of a size
>>>>>>>>>>>>>> penalty I am
>>>>>>>>>>>>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build.
>>>>>>>>>>>>>> Here are
>>>>>>>>>>>>>> the results (fixed width font will help - its space, not tab,
>>>>>>>>>>>>>> formatted):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Section non-reloc reloc
>>>>>>>>>>>>>> ---------------------------------------
>>>>>>>>>>>>>> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB)
>>>>>>>>>>>>>> bigger
>>>>>>>>>>>>>> .rodata 00005bad 000059d0
>>>>>>>>>>>>>> .interp n/a 00000013
>>>>>>>>>>>>>> .dynstr n/a 00000648
>>>>>>>>>>>>>> .hash n/a 00000428
>>>>>>>>>>>>>> .eh_frame 00003268 000034fc
>>>>>>>>>>>>>> .data 00000a6c 000001dc
>>>>>>>>>>>>>> .data.rel n/a 00000098
>>>>>>>>>>>>>> .data.rel.ro.local n/a 00000178
>>>>>>>>>>>>>> .data.rel.local n/a 000007e4
>>>>>>>>>>>>>> .got 00000000 000001f0
>>>>>>>>>>>>>> .got.plt n/a 0000000c
>>>>>>>>>>>>>> .rel.got n/a 000003e0
>>>>>>>>>>>>>> .rel.dyn n/a 00001228
>>>>>>>>>>>>>> .dynsym n/a 00000850
>>>>>>>>>>>>>> .dynamic n/a 00000080
>>>>>>>>>>>>>> .u_boot_cmd 000003c0 000003c0
>>>>>>>>>>>>>> .bss 00001a34 00001a34
>>>>>>>>>>>>>> .realmode 00000166 00000166
>>>>>>>>>>>>>> .bios 0000053e 0000053e
>>>>>>>>>>>>>> =======================================
>>>>>>>>>>>>>> Total 0001d5dd 00022287 <- 0x4caa bytes
>>>>>>>>>>>>>> (~19kB) bigger
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Its more than a 16% increase in size!!!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> .text accounts for a little under half of the total bloat, and
>>>>>>>>>>>>>> of that,
>>>>>>>>>>>>>> the crude dynamic loader accounts for only 341 bytes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Graeme,
>>>>>>>>>>>>> I would be interested in a third option (column), the x86
>>>>>>>>>>>>> build with
>>>>>>>>>>>>> just -mrelocateable but NOT -fpic. It will not be definitive
>>>>>>>>>>>>> because
>>>>>>>>>>>>> there
>>>>>>>>>>>>> will be extra code that references the GOT and missing code to
>>>>>>>>>>>>> do some of
>>>>>>>>>>>>> the relocation, but it would still be interesting.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> x86 does not have -mrelocatable. This is a PPC only option :(
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Graeme,
>>>>>>>>>>> You are unfortunately correct. However, I wonder if we
>>>>>>>>>>> can get
>>>>>>>>>>> essentially the same result by executing the final ld step with
>>>>>>>>>>> the
>>>>>>>>>>> --emit-relocs switch included. This may also include some "extra"
>>>>>>>>>>> sections
>>>>>>>>>>> that we would want to strip out, but if it works, it could give
>>>>>>>>>>> all
>>>>>>>>>>> ELF-based systems a way to a relocatable u-boot.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I don't think --emit-relocs is necessary with -pic. I haven't gone
>>>>>>>>>> through
>>>>>>>>>> all the permutations to see if there is a smaller option, but gcc
>>>>>>>>>> -fpic and
>>>>>>>>>> ld -pie creates enough information to perform relocation on the
>>>>>>>>>> x86
>>>>>>>>>> platform
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Try -fvisibility=hidden
>>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks - Shaved another 2539 bytes off the binary
>>>>>>>>
>>>>>>>> Also found out how to get rid of .eh_frame (crept in when I upgraded
>>>>>>>> to
>>>>>>>> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452
>>>>>>>> bytes
>>>>>>>>
>>>>>>>> Total saving of 15.6k
>>>>>>>>
>>>>>>>
>>>>>>> Great, so now you are back at just a few percent added I guess?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Not really - The .eh_frame saving applies to both relocated and non
>>>>>> relocated builds
>>>>>>
>>>>>
>>>>> OK, so you didn't use PIC before at all?
>>>>>
>>>>> Anyway I think you can do more. Using -Bsymbolic you should get
>>>>> away with RELATIVE relocs only and be able to skip a lot of segments
>>>>> above.
>>>>> Have a look at uClibc ldso/ldso/dl-startup.c
>>>>>
>>>>>
>>>>>
>>>>
>>>> My build options thus far are:
>>>>
>>>> PLATFORM_RELFLAGS += -fpie -fvisibility=hidden
>>>> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>>>> PLATFORM_LDFLAGS += -pie
>>>>
>>>> -fpic / -pic make no difference
>>>>
>>>
>>> not on x86, on ppc it is a big difference.
>>>
>>>
>>>>
>>>> Interestingly, -Bsymbolic adds exactly 8 bytes to .dynamic, but doesn't
>>>> change the size of any other section
>>>>
>>>> Pulling apart the relocation sections, it seems that all relocations are
>>>> already RELATIVE even without -Bsymbolic
>>>>
>>>
>>> Ah, that is because you built an exe with -pie
>>> Then you should be able to drop everything but the RELATIVE
>>> from the linking, or almost in any case.
>>>
>>> Jocke
>>>
>>>
>>>
>>
>> Hmm, so its seems I may have hit the limit. I tried:
>>
>> PLATFORM_LDFLAGS += -r --emit-relocs
>>
>> but there is not enough information left to complete the relocation.
>
> Hi Graeme,
> I am glad you tried this. It should work, -fpie should not be necessary.
> Did you also change PLATFORM_RELFLAGS to omit the -fpie? Without pie, and
> with no libraries linked in that are pie, there should BE no .got, AFIK. I
> wonder if absolutely everything is getting re-built, like maybe there is a
> library routine that is being linked in? What exactly was missing when you
> compiled and linked without pie?
I just tried with:
PLATFORM_RELFLAGS += -fvisibility=hidden
PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
PLATFORM_LDFLAGS += -Bsymbolic --emit-relocs
There is relocation information in the linker output, however it is not
marked for allocation so it gets stripped out when creating u-boot.bin
>
> Best Regards,
> Bill Campbell
>>
>> It
>> seems as though I need .rel.got, .got.plt, .dynsym and .rel.dyn in order
>> to find the actual bytes that need modifying (it also seems to mess with
>> the size of the stripped binary for some reason)
>>
>> Looks like I'll have to proceed with my original plan - a bit bloated,
>> but it works
>>
>> Graeme
>>
>>
>>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-10 15:38 ` Joakim Tjernlund
@ 2009-10-11 10:47 ` Graeme Russ
[not found] ` <OF83D1271F.04B67606-ONC125764C.0045BFF2-C125764C.0046AC45@transmode.se>
0 siblings, 1 reply; 47+ messages in thread
From: Graeme Russ @ 2009-10-11 10:47 UTC (permalink / raw)
To: u-boot
On Sun, Oct 11, 2009 at 2:38 AM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
> Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 13:21:10:
>>
>> On Sat, Oct 10, 2009 at 9:47 PM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>> >
>> >
>> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 12:38:19:
>> >>
>> >> On Sat, Oct 10, 2009 at 8:27 PM, Joakim Tjernlund
>> >> <joakim.tjernlund@transmode.se> wrote:
>> >> >
>> >> >
>> >> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 10:46:52:
>> >> >>
>> >> >> On Sat, Oct 10, 2009 at 7:07 PM, Joakim Tjernlund
>> >> >> <joakim.tjernlund@transmode.se> wrote:
>> >> >> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
>> >> >> >>
>> >> >> >> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
>> >> >> >> <joakim.tjernlund@transmode.se> wrote:
>> >> >> >> >>
>> >> >> >> >> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
>> >> >> >> >> <jwilliamcampbell@comcast.net> wrote:
>> >> >> >> >> > Graeme Russ wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
>> >> >> >> >> >> <jwilliamcampbell@comcast.net> wrote:
>> >> >> >> >> >>
>> >> >> >> >> >>>
>> >> >> >> >> >>> Graeme Russ wrote:
>> >> >> >> >> >>>
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>> >> >> >> >> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>> >> >> >> >> >>>> the results (fixed width font will help - its space, not tab,
>> >> >> >> >> >>>> formatted):
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> Section non-reloc reloc
>> >> >> >> >> >>>> ---------------------------------------
>> >> >> >> >> >>>> .text 000118c4 000137fc <- 0x1f38 bytes (~8kB) bigger
>> >> >> >> >> >>>> .rodata 00005bad 000059d0
>> >> >> >> >> >>>> .interp n/a 00000013
>> >> >> >> >> >>>> .dynstr n/a 00000648
>> >> >> >> >> >>>> .hash n/a 00000428
>> >> >> >> >> >>>> .eh_frame 00003268 000034fc
>> >> >> >> >> >>>> .data 00000a6c 000001dc
>> >> >> >> >> >>>> .data.rel n/a 00000098
>> >> >> >> >> >>>> .data.rel.ro.local n/a 00000178
>> >> >> >> >> >>>> .data.rel.local n/a 000007e4
>> >> >> >> >> >>>> .got 00000000 000001f0
>> >> >> >> >> >>>> .got.plt n/a 0000000c
>> >> >> >> >> >>>> .rel.got n/a 000003e0
>> >> >> >> >> >>>> .rel.dyn n/a 00001228
>> >> >> >> >> >>>> .dynsym n/a 00000850
>> >> >> >> >> >>>> .dynamic n/a 00000080
>> >> >> >> >> >>>> .u_boot_cmd 000003c0 000003c0
>> >> >> >> >> >>>> .bss 00001a34 00001a34
>> >> >> >> >> >>>> .realmode 00000166 00000166
>> >> >> >> >> >>>> .bios 0000053e 0000053e
>> >> >> >> >> >>>> =======================================
>> >> >> >> >> >>>> Total 0001d5dd 00022287 <- 0x4caa bytes (~19kB) bigger
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> Its more than a 16% increase in size!!!
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> .text accounts for a little under half of the total bloat, and of that,
>> >> >> >> >> >>>> the crude dynamic loader accounts for only 341 bytes
>> >> >> >> >> >>>>
>> >> >> >> >> >>>>
>> >> >> >> >> >>>
>> >> >> >> >> >>> Hi Graeme,
>> >> >> >> >> >>> I would be interested in a third option (column), the x86 build with
>> >> >> >> >> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
>> >> >> >> >> >>> there
>> >> >> >> >> >>> will be extra code that references the GOT and missing code todo some of
>> >> >> >> >> >>> the relocation, but it would still be interesting.
>> >> >> >> >> >>>
>> >> >> >> >> >>
>> >> >> >> >> >> x86 does not have -mrelocatable. This is a PPC only option :(
>> >> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> > Hi Graeme,
>> >> >> >> >> > You are unfortunately correct. However, I wonder if we can get
>> >> >> >> >> > essentially the same result by executing the final ld step with the
>> >> >> >> >> > --emit-relocs switch included. This may also include some "extra" sections
>> >> >> >> >> > that we would want to strip out, but if it works, it could give all
>> >> >> >> >> > ELF-based systems a way to a relocatable u-boot.
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> I don't think --emit-relocs is necessary with -pic. I haven't gone through
>> >> >> >> >> all the permutations to see if there is a smaller option, but gcc -fpic and
>> >> >> >> >> ld -pie creates enough information to perform relocation on the x86
>> >> >> >> >> platform
>> >> >> >> >
>> >> >> >> > Try -fvisibility=hidden
>> >> >> >>
>> >> >> >> Thanks - Shaved another 2539 bytes off the binary
>> >> >> >>
>> >> >> >> Also found out how to get rid of .eh_frame (crept in when I upgraded to
>> >> >> >> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
>> >> >> >>
>> >> >> >> Total saving of 15.6k
>> >> >> >
>> >> >> > Great, so now you are back at just a few percent added I guess?
>> >> >> >
>> >> >> >
>> >> >>
>> >> >> Not really - The .eh_frame saving applies to both relocated and non
>> >> >> relocated builds
>> >> >
>> >> > OK, so you didn't use PIC before at all?
>> >> >
>> >> > Anyway I think you can do more. Using -Bsymbolic you should get
>> >> > away with RELATIVE relocs only and be able to skip a lot of segments above.
>> >> > Have a look at uClibc ldso/ldso/dl-startup.c
>> >> >
>> >> >
>> >>
>> >> My build options thus far are:
>> >>
>> >> PLATFORM_RELFLAGS += -fpie -fvisibility=hidden
>> >> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>> >> PLATFORM_LDFLAGS += -pie
>> >>
>> >> -fpic / -pic make no difference
>> >
>> > not on x86, on ppc it is a big difference.
>> >
>> >>
>> >> Interestingly, -Bsymbolic adds exactly 8 bytes to .dynamic, but doesn't
>> >> change the size of any other section
>> >>
>> >> Pulling apart the relocation sections, it seems that all relocations are
>> >> already RELATIVE even without -Bsymbolic
>> >
>> > Ah, that is because you built an exe with -pie
>> > Then you should be able to drop everything but the RELATIVE
>> > from the linking, or almost in any case.
>> >
>> > Jocke
>> >
>> >
>>
>> Hmm, so its seems I may have hit the limit. I tried:
>>
>> PLATFORM_LDFLAGS += -r --emit-relocs
>>
>> but there is not enough information left to complete the relocation. It
>> seems as though I need .rel.got, .got.plt, .dynsym and .rel.dyn in order
>> to find the actual bytes that need modifying (it also seems to mess with
>> the size of the stripped binary for some reason)
>>
>> Looks like I'll have to proceed with my original plan - a bit bloated,
>> but it works
>
> Relocation costs :(
>
> I am not sure why you need .got.plt, it should be empty,
> what is in it?
> Same with dynsym, what is in it?
>
> Memory fails me, but since u-boot is a freestanding app it I think
> these two might not be needed. Perhaps there are weak unresolved
> syms in there?
>
> Jocke
>
>
Well, I'm in the middle of a pretty intense analysis of what is going on.
Compile flags are:
PLATFORM_RELFLAGS += -fpic -fvisibility=hidden
PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
PLATFORM_LDFLAGS += -pic -Bsymbolic
So far I have found that the only sections that have changes as a result
of a change in TEXT_BASE are:
.text
.rodata
.data.rel
.got
.got.plt
.rel.text
.rel.got
.rel.dyn
.dynsym
.dynamic
.u_boot_cmd
Changes in .text are covered by .rel.text (see below) or as a result of
CONFIG_SYS_MONITOR_BASE being equal to TEXT_BASE (used in cfi_flash.c)
Changes in .rodata are a result of version_string changing for each
compile
.rel.text
- Contains a list of pointers into .got
- All entries are R_386_RELATIVE
- All entries (8 of) are in cpu/i386/start.o
- cpu/i386/start.o only used during initial bootstrap - not needed
after execution starts in RAM
- Can be safely discarded
.rel.got
- Contains a list of pointers into .got
- All entries are R_386_RELATIVE
- Not all entries change with TEXT_BASE. Some entries are symbols
exported from the linker script (in particular section size
exports) while the others are in the somewhat 'special' BIOS and
Real Mode sections which are located in a fixed RAM location (these
sections are used for real-mode trampolining into Linux by providing
a limited PC 'BIOS'
- All entries that are not linked to TEXT_BASE are easily identified
because they are 'located' below TEXT_BASE (specically between
0x00000000 and 0x00001A34)
- This section is not needed in the final binary - Direct processing
of .got will achieve the required end result
.rel.dyn
- Contains a list of pointers into .data.rel and .u_boot_cmd
- Like .rel.got, not all entries in .data.rel need relocating. Again,
like .rel.got, these are easily identified
- This section not needed
Another 5.5k saved
So, all that is left are .dynsym and .dynamic ...
.dynsym
- Contains 70 entries (16 bytes each, 1120 bytes)
- 44 entries mimic those entries in .got which are not relocated
- 21 entries are the remaining symbols exported from the linker
script
- 4 entries are labels defined in inline asm and used in C
- 1 entry is a NULL entry
.dynamic
- 88 bytes
- Array of Elf32_Dyn
- typedef struct {
Elf32_Sword d_tag;
union {
Elf32_Word d_val;
Elf32_Addr d_ptr;
} d_un;
} Elf32_Dyn;
- 0x11 entries
[00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
[01] 0x00000004, 0x38059994 DT_HASH, points to .hash
[02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
[03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
[04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
[05] 0x0000000B, 0x00000010 DT_SYMENT, ???
[06] 0x00000015, 0x00000000 DT_DEBUG, ???
[07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
[08] 0x00000012, 0x000014D8 DT_RELSZ, ???
[09] 0x00000013, 0x00000008 DT_RELENT, ???
[0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
[0b] 0x6FFFFFFA, 0x00000236 ???, Entries in .rel.dyn
[0c] 0x00000000, 0x00000000 DT_NULL, End of Array
[0d] 0x00000000, 0x00000000 DT_NULL, End of Array
[0e] 0x00000000, 0x00000000 DT_NULL, End of Array
[0f] 0x00000000, 0x00000000 DT_NULL, End of Array
[10] 0x00000000, 0x00000000 DT_NULL, End of Array
I think some more investigation into the need for .dynsym and .dynamic is
still required...
Regards,
Graeme
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
[not found] ` <OF83D1271F.04B67606-ONC125764C.0045BFF2-C125764C.0046AC45@transmode.se>
@ 2009-10-13 11:21 ` Graeme Russ
2009-10-13 11:53 ` Joakim Tjernlund
0 siblings, 1 reply; 47+ messages in thread
From: Graeme Russ @ 2009-10-13 11:21 UTC (permalink / raw)
To: u-boot
On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
[Massive Snip :)]
>
>>
>> So, all that is left are .dynsym and .dynamic ...
>> .dynsym
>> - Contains 70 entries (16 bytes each, 1120 bytes)
>> - 44 entries mimic those entries in .got which are not relocated
>> - 21 entries are the remaining symbols exported from the linker
>> script
>> - 4 entries are labels defined in inline asm and used in C
> Try adding proper asm declarations. Look at what gcc
> generates for a function/variable and mimic these.
Thanks - Now .dynsym contains only exports from the linker script
>
>> - 1 entry is a NULL entry
>>
>> .dynamic
>> - 88 bytes
>> - Array of Elf32_Dyn
>> - typedef struct {
>> Elf32_Sword d_tag;
>> union {
>> Elf32_Word d_val;
>> Elf32_Addr d_ptr;
>> } d_un;
>> } Elf32_Dyn;
>> - 0x11 entries
>> [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
>> [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
>> [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
>> [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
>> [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
>> [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
>> [06] 0x00000015, 0x00000000 DT_DEBUG, ???
>> [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
>> [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
> How big DT_REL is
>> [09] 0x00000013, 0x00000008 DT_RELENT, ???
> hmm, cannot remeber :)
How big an entry in DT_REL is
>> [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
> Oops, you got text relocations. This is generally a bad thing.
> TEXTREL is commonly caused by asm code that arent truly pic so it needs
> to modify the .text segment to adjust for relocation.
> You should get rid of this one. Look for DT_TEXTREL in .o files to find
> the culprit.
>
Alas I cannot - The relocations are a result of loading a register with a
return address when calling show_boot_progress in the very early stages of
initialisation prior to the stack becoming available. The x86 does not
allow direct access to the IP so the only way to find the 'current
execution address' is to 'call' to the next instruction and pop the return
address off the stack
This is not a problem because this is very low-level init that is not
called once relocated into RAM - These relocations can be safely ignored
>> [0b] 0x6FFFFFFA, 0x00000236 ???, Entries in .rel.dyn
>> [0c] 0x00000000, 0x00000000 DT_NULL, End of Array
>> [0d] 0x00000000, 0x00000000 DT_NULL, End of Array
>> [0e] 0x00000000, 0x00000000 DT_NULL, End of Array
>> [0f] 0x00000000, 0x00000000 DT_NULL, End of Array
>> [10] 0x00000000, 0x00000000 DT_NULL, End of Array
>>
>> I think some more investigation into the need for .dynsym and .dynamic is
>> still required...
.dynsym may still be required if only for accessing the __u_boot_cmd
structure. However, I may be able to hack that a little and not create a
__u_boot_cmd symbol in the linker script (create some other temporary
symbol) and populate __u_boot_cmd with a valid value after relocation. It
will look a little weird, but may mean not loading this section into RAM
Other than that, .dynsym is now only needed to locate the sections during
the relocation phase and can be kept in flash and not copied to RAM
I don't think .dynamic is needed due to the exporting of section addresses
from the linker script
Regards,
Graeme
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-13 11:21 ` Graeme Russ
@ 2009-10-13 11:53 ` Joakim Tjernlund
2009-10-13 16:30 ` J. William Campbell
2009-10-13 20:06 ` Graeme Russ
0 siblings, 2 replies; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-13 11:53 UTC (permalink / raw)
To: u-boot
Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
> <joakim.tjernlund@transmode.se> wrote:
> > Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>
> [Massive Snip :)]
>
> >
> >>
> >> So, all that is left are .dynsym and .dynamic ...
> >> .dynsym
> >> - Contains 70 entries (16 bytes each, 1120 bytes)
> >> - 44 entries mimic those entries in .got which are not relocated
> >> - 21 entries are the remaining symbols exported from the linker
> >> script
> >> - 4 entries are labels defined in inline asm and used in C
> > Try adding proper asm declarations. Look at what gcc
> > generates for a function/variable and mimic these.
>
> Thanks - Now .dynsym contains only exports from the linker script
:)
>
> >
> >> - 1 entry is a NULL entry
> >>
> >> .dynamic
> >> - 88 bytes
> >> - Array of Elf32_Dyn
> >> - typedef struct {
> >> Elf32_Sword d_tag;
> >> union {
> >> Elf32_Word d_val;
> >> Elf32_Addr d_ptr;
> >> } d_un;
> >> } Elf32_Dyn;
> >> - 0x11 entries
> >> [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
> >> [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
> >> [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
> >> [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
> >> [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
> >> [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
> >> [06] 0x00000015, 0x00000000 DT_DEBUG, ???
> >> [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
> >> [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
> > How big DT_REL is
> >> [09] 0x00000013, 0x00000008 DT_RELENT, ???
> > hmm, cannot remeber :)
>
> How big an entry in DT_REL is
Right, how could I forget :)
>
> >> [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
> > Oops, you got text relocations. This is generally a bad thing.
> > TEXTREL is commonly caused by asm code that arent truly pic so it needs
> > to modify the .text segment to adjust for relocation.
> > You should get rid of this one. Look for DT_TEXTREL in .o files to find
> > the culprit.
> >
>
> Alas I cannot - The relocations are a result of loading a register with a
> return address when calling show_boot_progress in the very early stages of
> initialisation prior to the stack becoming available. The x86 does not
> allow direct access to the IP so the only way to find the 'current
> execution address' is to 'call' to the next instruction and pop the return
> address off the stack
hmm, same as ppc but that in it self should not cause a TEXREL, should it?
Ahh, the 'call' is absolute, not relative? I guess there is some way around it
but it is not important ATM I guess.
Evil idea, skip -fpic et. all and add the full reloc procedure
to relocate by rewriting directly in TEXT segment. Then you save space
but you need more relocation code. Something like dl_do_reloc from
uClibc. Wonder how much extra code that would be? Not too much I think.
>
> This is not a problem because this is very low-level init that is not
> called once relocated into RAM - These relocations can be safely ignored
>
> >> [0b] 0x6FFFFFFA, 0x00000236 ???, Entries in .rel.dyn
> >> [0c] 0x00000000, 0x00000000 DT_NULL, End of Array
> >> [0d] 0x00000000, 0x00000000 DT_NULL, End of Array
> >> [0e] 0x00000000, 0x00000000 DT_NULL, End of Array
> >> [0f] 0x00000000, 0x00000000 DT_NULL, End of Array
> >> [10] 0x00000000, 0x00000000 DT_NULL, End of Array
> >>
> >> I think some more investigation into the need for .dynsym and .dynamic is
> >> still required...
>
> .dynsym may still be required if only for accessing the __u_boot_cmd
> structure. However, I may be able to hack that a little and not create a
> __u_boot_cmd symbol in the linker script (create some other temporary
> symbol) and populate __u_boot_cmd with a valid value after relocation. It
> will look a little weird, but may mean not loading this section into RAM
Why do you need to much around with u_boot_cmd at all? Now that relocation
works you should be able to drop all that code/linker stuff?
>
> Other than that, .dynsym is now only needed to locate the sections during
> the relocation phase and can be kept in flash and not copied to RAM
Still occupies space in the *bin image though.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-13 11:53 ` Joakim Tjernlund
@ 2009-10-13 16:30 ` J. William Campbell
2009-10-13 16:55 ` Joakim Tjernlund
2009-10-13 20:06 ` Graeme Russ
1 sibling, 1 reply; 47+ messages in thread
From: J. William Campbell @ 2009-10-13 16:30 UTC (permalink / raw)
To: u-boot
Joakim Tjernlund wrote:
> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
>
>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>>
>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>>>
>> [Massive Snip :)]
>>
>>
>>>> So, all that is left are .dynsym and .dynamic ...
>>>> .dynsym
>>>> - Contains 70 entries (16 bytes each, 1120 bytes)
>>>> - 44 entries mimic those entries in .got which are not relocated
>>>> - 21 entries are the remaining symbols exported from the linker
>>>> script
>>>> - 4 entries are labels defined in inline asm and used in C
>>>>
>>> Try adding proper asm declarations. Look at what gcc
>>> generates for a function/variable and mimic these.
>>>
>> Thanks - Now .dynsym contains only exports from the linker script
>>
> :)
>
>>>> - 1 entry is a NULL entry
>>>>
>>>> .dynamic
>>>> - 88 bytes
>>>> - Array of Elf32_Dyn
>>>> - typedef struct {
>>>> Elf32_Sword d_tag;
>>>> union {
>>>> Elf32_Word d_val;
>>>> Elf32_Addr d_ptr;
>>>> } d_un;
>>>> } Elf32_Dyn;
>>>> - 0x11 entries
>>>> [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
>>>> [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
>>>> [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
>>>> [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
>>>> [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
>>>> [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
>>>> [06] 0x00000015, 0x00000000 DT_DEBUG, ???
>>>> [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
>>>> [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
>>>>
>>> How big DT_REL is
>>>
>>>> [09] 0x00000013, 0x00000008 DT_RELENT, ???
>>>>
>>> hmm, cannot remeber :)
>>>
>> How big an entry in DT_REL is
>>
>
> Right, how could I forget :)
>
>>>> [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
>>>>
>>> Oops, you got text relocations. This is generally a bad thing.
>>> TEXTREL is commonly caused by asm code that arent truly pic so it needs
>>> to modify the .text segment to adjust for relocation.
>>> You should get rid of this one. Look for DT_TEXTREL in .o files to find
>>> the culprit.
>>>
>>>
>> Alas I cannot - The relocations are a result of loading a register with a
>> return address when calling show_boot_progress in the very early stages of
>> initialisation prior to the stack becoming available. The x86 does not
>> allow direct access to the IP so the only way to find the 'current
>> execution address' is to 'call' to the next instruction and pop the return
>> address off the stack
>>
>
> hmm, same as ppc but that in it self should not cause a TEXREL, should it?
> Ahh, the 'call' is absolute, not relative? I guess there is some way around it
> but it is not important ATM I guess.
>
> Evil idea, skip -fpic et. all and add the full reloc procedure
> to relocate by rewriting directly in TEXT segment. Then you save space
> but you need more relocation code. Something like dl_do_reloc from
> uClibc. Wonder how much extra code that would be? Not too much I think.
>
I think this approach will turn out to be a big win. At present, the
problem with just using the relocs is that objcopy is stripping them out
when u-boot.bin is created, as I understand it. It seems this can be
solved by changing the command switches appropriately, like using
--strip-unneeded. In any case, there is some combination of switches
that will preserve the relocation data. The executable code will get
smaller, there will be no .got, and the relocation data will be larger
(than with -fpic). In total size, it probably will be slightly smaller,
but that is a guess. The most important benefit of this approach is that
it will work for all architectures, thereby solving the problem once and
forever! Even if the result is a bit larger, the RAM footprint will be
reduced by the smaller object code size (since the relocation data need
not be copied into ram).Having this approach as an option would be real
nice, since it would always "just work".
Best Regards,
Bill Campbell
>
>> This is not a problem because this is very low-level init that is not
>> called once relocated into RAM - These relocations can be safely ignored
>>
>
>
>>>> [0b] 0x6FFFFFFA, 0x00000236 ???, Entries in .rel.dyn
>>>> [0c] 0x00000000, 0x00000000 DT_NULL, End of Array
>>>> [0d] 0x00000000, 0x00000000 DT_NULL, End of Array
>>>> [0e] 0x00000000, 0x00000000 DT_NULL, End of Array
>>>> [0f] 0x00000000, 0x00000000 DT_NULL, End of Array
>>>> [10] 0x00000000, 0x00000000 DT_NULL, End of Array
>>>>
>>>> I think some more investigation into the need for .dynsym and .dynamic is
>>>> still required...
>>>>
>> .dynsym may still be required if only for accessing the __u_boot_cmd
>> structure. However, I may be able to hack that a little and not create a
>> __u_boot_cmd symbol in the linker script (create some other temporary
>> symbol) and populate __u_boot_cmd with a valid value after relocation. It
>> will look a little weird, but may mean not loading this section into RAM
>>
>
> Why do you need to much around with u_boot_cmd at all? Now that relocation
> works you should be able to drop all that code/linker stuff?
>
>
>> Other than that, .dynsym is now only needed to locate the sections during
>> the relocation phase and can be kept in flash and not copied to RAM
>>
>
> Still occupies space in the *bin image though.
>
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-13 16:30 ` J. William Campbell
@ 2009-10-13 16:55 ` Joakim Tjernlund
0 siblings, 0 replies; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-13 16:55 UTC (permalink / raw)
To: u-boot
"J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 13/10/2009 18:30:43:
>
> Joakim Tjernlund wrote:
> > Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
> >
> >> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
> >> <joakim.tjernlund@transmode.se> wrote:
> >>
> >>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
> >>>
> >> [Massive Snip :)]
> >>
> >>
> >>>> So, all that is left are .dynsym and .dynamic ...
> >>>> .dynsym
> >>>> - Contains 70 entries (16 bytes each, 1120 bytes)
> >>>> - 44 entries mimic those entries in .got which are not relocated
> >>>> - 21 entries are the remaining symbols exported from the linker
> >>>> script
> >>>> - 4 entries are labels defined in inline asm and used in C
> >>>>
> >>> Try adding proper asm declarations. Look at what gcc
> >>> generates for a function/variable and mimic these.
> >>>
> >> Thanks - Now .dynsym contains only exports from the linker script
> >>
> > :)
> >
> >>>> - 1 entry is a NULL entry
> >>>>
> >>>> .dynamic
> >>>> - 88 bytes
> >>>> - Array of Elf32_Dyn
> >>>> - typedef struct {
> >>>> Elf32_Sword d_tag;
> >>>> union {
> >>>> Elf32_Word d_val;
> >>>> Elf32_Addr d_ptr;
> >>>> } d_un;
> >>>> } Elf32_Dyn;
> >>>> - 0x11 entries
> >>>> [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
> >>>> [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
> >>>> [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
> >>>> [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
> >>>> [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
> >>>> [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
> >>>> [06] 0x00000015, 0x00000000 DT_DEBUG, ???
> >>>> [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
> >>>> [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
> >>>>
> >>> How big DT_REL is
> >>>
> >>>> [09] 0x00000013, 0x00000008 DT_RELENT, ???
> >>>>
> >>> hmm, cannot remeber :)
> >>>
> >> How big an entry in DT_REL is
> >>
> >
> > Right, how could I forget :)
> >
> >>>> [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
> >>>>
> >>> Oops, you got text relocations. This is generally a bad thing.
> >>> TEXTREL is commonly caused by asm code that arent truly pic so it needs
> >>> to modify the .text segment to adjust for relocation.
> >>> You should get rid of this one. Look for DT_TEXTREL in .o files to find
> >>> the culprit.
> >>>
> >>>
> >> Alas I cannot - The relocations are a result of loading a register with a
> >> return address when calling show_boot_progress in the very early stages of
> >> initialisation prior to the stack becoming available. The x86 does not
> >> allow direct access to the IP so the only way to find the 'current
> >> execution address' is to 'call' to the next instruction and pop the return
> >> address off the stack
> >>
> >
> > hmm, same as ppc but that in it self should not cause a TEXREL, should it?
> > Ahh, the 'call' is absolute, not relative? I guess there is some way around it
> > but it is not important ATM I guess.
> >
> > Evil idea, skip -fpic et. all and add the full reloc procedure
> > to relocate by rewriting directly in TEXT segment. Then you save space
> > but you need more relocation code. Something like dl_do_reloc from
> > uClibc. Wonder how much extra code that would be? Not too much I think.
> >
> I think this approach will turn out to be a big win. At present, the
> problem with just using the relocs is that objcopy is stripping them out
> when u-boot.bin is created, as I understand it. It seems this can be
> solved by changing the command switches appropriately, like using
> --strip-unneeded. In any case, there is some combination of switches
> that will preserve the relocation data. The executable code will get
> smaller, there will be no .got, and the relocation data will be larger
> (than with -fpic). In total size, it probably will be slightly smaller,
> but that is a guess. The most important benefit of this approach is that
> it will work for all architectures, thereby solving the problem once and
> forever! Even if the result is a bit larger, the RAM footprint will be
> reduced by the smaller object code size (since the relocation data need
> not be copied into ram).Having this approach as an option would be real
> nice, since it would always "just work".
Yes, I had this in the back of my head. I do think some other arch than ppc
will have to try this out though :)
I am not 100% sure this will work with my end goal, true PIC so I can load
the same img anywhere in flash.
Jocke
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-13 11:53 ` Joakim Tjernlund
2009-10-13 16:30 ` J. William Campbell
@ 2009-10-13 20:06 ` Graeme Russ
[not found] ` <OF32A18F38.511FF11C-ONC125764E.00750716-C125764E.007534EE@ <4AD511E4.9090204@comcast.net>
2009-10-13 21:20 ` Joakim Tjernlund
1 sibling, 2 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-13 20:06 UTC (permalink / raw)
To: u-boot
On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>> > Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>>
>> [Massive Snip :)]
>>
>> >
>> >>
>> >> So, all that is left are .dynsym and .dynamic ...
>> >> .dynsym
>> >> - Contains 70 entries (16 bytes each, 1120 bytes)
>> >> - 44 entries mimic those entries in .got which are not relocated
>> >> - 21 entries are the remaining symbols exported from the linker
>> >> script
>> >> - 4 entries are labels defined in inline asm and used in C
>> > Try adding proper asm declarations. Look at what gcc
>> > generates for a function/variable and mimic these.
>>
>> Thanks - Now .dynsym contains only exports from the linker script
> :)
>>
>> >
>> >> - 1 entry is a NULL entry
>> >>
>> >> .dynamic
>> >> - 88 bytes
>> >> - Array of Elf32_Dyn
>> >> - typedef struct {
>> >> Elf32_Sword d_tag;
>> >> union {
>> >> Elf32_Word d_val;
>> >> Elf32_Addr d_ptr;
>> >> } d_un;
>> >> } Elf32_Dyn;
>> >> - 0x11 entries
>> >> [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
>> >> [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
>> >> [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
>> >> [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
>> >> [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
>> >> [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
>> >> [06] 0x00000015, 0x00000000 DT_DEBUG, ???
>> >> [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
>> >> [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
>> > How big DT_REL is
>> >> [09] 0x00000013, 0x00000008 DT_RELENT, ???
>> > hmm, cannot remeber :)
>>
>> How big an entry in DT_REL is
>
> Right, how could I forget :)
>>
>> >> [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
>> > Oops, you got text relocations. This is generally a bad thing.
>> > TEXTREL is commonly caused by asm code that arent truly pic so it needs
>> > to modify the .text segment to adjust for relocation.
>> > You should get rid of this one. Look for DT_TEXTREL in .o files to find
>> > the culprit.
>> >
>>
>> Alas I cannot - The relocations are a result of loading a register with a
>> return address when calling show_boot_progress in the very early stages of
>> initialisation prior to the stack becoming available. The x86 does not
>> allow direct access to the IP so the only way to find the 'current
>> execution address' is to 'call' to the next instruction and pop the return
>> address off the stack
>
> hmm, same as ppc but that in it self should not cause a TEXREL, should it?
> Ahh, the 'call' is absolute, not relative? I guess there is some way around it
> but it is not important ATM I guess.
>
> Evil idea, skip -fpic et. all and add the full reloc procedure
> to relocate by rewriting directly in TEXT segment. Then you save space
> but you need more relocation code. Something like dl_do_reloc from
> uClibc. Wonder how much extra code that would be? Not too much I think.
>
With the following flags
PLATFORM_RELFLAGS += -fvisibility=hidden
PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
this might mean I need the symbol table in the binary in order to resolve
them
>>
>> This is not a problem because this is very low-level init that is not
>> called once relocated into RAM - These relocations can be safely ignored
>
>>
>> >> [0b] 0x6FFFFFFA, 0x00000236 ???, Entries in .rel.dyn
>> >> [0c] 0x00000000, 0x00000000 DT_NULL, End of Array
>> >> [0d] 0x00000000, 0x00000000 DT_NULL, End of Array
>> >> [0e] 0x00000000, 0x00000000 DT_NULL, End of Array
>> >> [0f] 0x00000000, 0x00000000 DT_NULL, End of Array
>> >> [10] 0x00000000, 0x00000000 DT_NULL, End of Array
>> >>
>> >> I think some more investigation into the need for .dynsym and .dynamic is
>> >> still required...
>>
>> .dynsym may still be required if only for accessing the __u_boot_cmd
>> structure. However, I may be able to hack that a little and not create a
>> __u_boot_cmd symbol in the linker script (create some other temporary
>> symbol) and populate __u_boot_cmd with a valid value after relocation. It
>> will look a little weird, but may mean not loading this section into RAM
>
> Why do you need to much around with u_boot_cmd at all? Now that relocation
> works you should be able to drop all that code/linker stuff?
>
>>
>> Other than that, .dynsym is now only needed to locate the sections during
>> the relocation phase and can be kept in flash and not copied to RAM
>
> Still occupies space in the *bin image though.
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-13 20:06 ` Graeme Russ
[not found] ` <OF32A18F38.511FF11C-ONC125764E.00750716-C125764E.007534EE@ <4AD511E4.9090204@comcast.net>
@ 2009-10-13 21:20 ` Joakim Tjernlund
2009-10-13 23:48 ` J. William Campbell
1 sibling, 1 reply; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-13 21:20 UTC (permalink / raw)
To: u-boot
Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
>
> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
> <joakim.tjernlund@transmode.se> wrote:
> > Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
> >> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
> >> <joakim.tjernlund@transmode.se> wrote:
> >> > Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
> >>
> >> [Massive Snip :)]
> >>
> >> >
> >> >>
> >> >> So, all that is left are .dynsym and .dynamic ...
> >> >> .dynsym
> >> >> - Contains 70 entries (16 bytes each, 1120 bytes)
> >> >> - 44 entries mimic those entries in .got which are not relocated
> >> >> - 21 entries are the remaining symbols exported from the linker
> >> >> script
> >> >> - 4 entries are labels defined in inline asm and used in C
> >> > Try adding proper asm declarations. Look at what gcc
> >> > generates for a function/variable and mimic these.
> >>
> >> Thanks - Now .dynsym contains only exports from the linker script
> > :)
> >>
> >> >
> >> >> - 1 entry is a NULL entry
> >> >>
> >> >> .dynamic
> >> >> - 88 bytes
> >> >> - Array of Elf32_Dyn
> >> >> - typedef struct {
> >> >> Elf32_Sword d_tag;
> >> >> union {
> >> >> Elf32_Word d_val;
> >> >> Elf32_Addr d_ptr;
> >> >> } d_un;
> >> >> } Elf32_Dyn;
> >> >> - 0x11 entries
> >> >> [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
> >> >> [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
> >> >> [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
> >> >> [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
> >> >> [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
> >> >> [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
> >> >> [06] 0x00000015, 0x00000000 DT_DEBUG, ???
> >> >> [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
> >> >> [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
> >> > How big DT_REL is
> >> >> [09] 0x00000013, 0x00000008 DT_RELENT, ???
> >> > hmm, cannot remeber :)
> >>
> >> How big an entry in DT_REL is
> >
> > Right, how could I forget :)
> >>
> >> >> [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
> >> > Oops, you got text relocations. This is generally a bad thing.
> >> > TEXTREL is commonly caused by asm code that arent truly pic so it needs
> >> > to modify the .text segment to adjust for relocation.
> >> > You should get rid of this one. Look for DT_TEXTREL in .o files to find
> >> > the culprit.
> >> >
> >>
> >> Alas I cannot - The relocations are a result of loading a register with a
> >> return address when calling show_boot_progress in the very early stages of
> >> initialisation prior to the stack becoming available. The x86 does not
> >> allow direct access to the IP so the only way to find the 'current
> >> execution address' is to 'call' to the next instruction and pop the return
> >> address off the stack
> >
> > hmm, same as ppc but that in it self should not cause a TEXREL, should it?
> > Ahh, the 'call' is absolute, not relative? I guess there is some way around it
> > but it is not important ATM I guess.
> >
> > Evil idea, skip -fpic et. all and add the full reloc procedure
> > to relocate by rewriting directly in TEXT segment. Then you save space
> > but you need more relocation code. Something like dl_do_reloc from
> > uClibc. Wonder how much extra code that would be? Not too much I think.
> >
>
> With the following flags
>
> PLATFORM_RELFLAGS += -fvisibility=hidden
> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
>
> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
> this might mean I need the symbol table in the binary in order to resolve
> them
Possibly, but I think you only need to add an offset to all those
relocs.
Jokce
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-13 21:20 ` Joakim Tjernlund
@ 2009-10-13 23:48 ` J. William Campbell
2009-10-14 7:25 ` Joakim Tjernlund
0 siblings, 1 reply; 47+ messages in thread
From: J. William Campbell @ 2009-10-13 23:48 UTC (permalink / raw)
To: u-boot
Joakim Tjernlund wrote:
> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
>
>
>> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>>
>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
>>>
>>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>
>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>>>>>
>>>> [Massive Snip :)]
>>>>
>>>>
>>>>>> So, all that is left are .dynsym and .dynamic ...
>>>>>> .dynsym
>>>>>> - Contains 70 entries (16 bytes each, 1120 bytes)
>>>>>> - 44 entries mimic those entries in .got which are not relocated
>>>>>> - 21 entries are the remaining symbols exported from the linker
>>>>>> script
>>>>>> - 4 entries are labels defined in inline asm and used in C
>>>>>>
>>>>> Try adding proper asm declarations. Look at what gcc
>>>>> generates for a function/variable and mimic these.
>>>>>
>>>> Thanks - Now .dynsym contains only exports from the linker script
>>>>
>>> :)
>>>
>>>>>> - 1 entry is a NULL entry
>>>>>>
>>>>>> .dynamic
>>>>>> - 88 bytes
>>>>>> - Array of Elf32_Dyn
>>>>>> - typedef struct {
>>>>>> Elf32_Sword d_tag;
>>>>>> union {
>>>>>> Elf32_Word d_val;
>>>>>> Elf32_Addr d_ptr;
>>>>>> } d_un;
>>>>>> } Elf32_Dyn;
>>>>>> - 0x11 entries
>>>>>> [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
>>>>>> [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
>>>>>> [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
>>>>>> [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
>>>>>> [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
>>>>>> [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
>>>>>> [06] 0x00000015, 0x00000000 DT_DEBUG, ???
>>>>>> [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
>>>>>> [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
>>>>>>
>>>>> How big DT_REL is
>>>>>
>>>>>> [09] 0x00000013, 0x00000008 DT_RELENT, ???
>>>>>>
>>>>> hmm, cannot remeber :)
>>>>>
>>>> How big an entry in DT_REL is
>>>>
>>> Right, how could I forget :)
>>>
>>>>>> [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
>>>>>>
>>>>> Oops, you got text relocations. This is generally a bad thing.
>>>>> TEXTREL is commonly caused by asm code that arent truly pic so it needs
>>>>> to modify the .text segment to adjust for relocation.
>>>>> You should get rid of this one. Look for DT_TEXTREL in .o files to find
>>>>> the culprit.
>>>>>
>>>>>
>>>> Alas I cannot - The relocations are a result of loading a register with a
>>>> return address when calling show_boot_progress in the very early stages of
>>>> initialisation prior to the stack becoming available. The x86 does not
>>>> allow direct access to the IP so the only way to find the 'current
>>>> execution address' is to 'call' to the next instruction and pop the return
>>>> address off the stack
>>>>
>>> hmm, same as ppc but that in it self should not cause a TEXREL, should it?
>>> Ahh, the 'call' is absolute, not relative? I guess there is some way around it
>>> but it is not important ATM I guess.
>>>
>>> Evil idea, skip -fpic et. all and add the full reloc procedure
>>> to relocate by rewriting directly in TEXT segment. Then you save space
>>> but you need more relocation code. Something like dl_do_reloc from
>>> uClibc. Wonder how much extra code that would be? Not too much I think.
>>>
>>>
>> With the following flags
>>
>> PLATFORM_RELFLAGS += -fvisibility=hidden
>> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
>>
>> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
>> this might mean I need the symbol table in the binary in order to resolve
>> them
>>
>
> Possibly, but I think you only need to add an offset to all those
> relocs.
>
Almost right. The relocations specify a symbol value that needs to be
added to the data in memory to relocate the reference. The symbol values
involved should be the start of the text section for program references,
the start of the uninitialized data section for bss references, and the
start of the data section for initialized data and constants. So there
are about four symbols whose value you need to keep. Take a look at
http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
already looked at) and it tells you what to do with R_386_PC32 ad
R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
will remove all the symbols you don't actually need, but I don't know
that for sure. Note also that you can change the section flags of a
section marked noload to load.
Best Regards,
Bill Campbell
> Jokce
>
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-13 23:48 ` J. William Campbell
@ 2009-10-14 7:25 ` Joakim Tjernlund
2009-10-14 11:48 ` Graeme Russ
2009-10-14 15:35 ` J. William Campbell
0 siblings, 2 replies; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-14 7:25 UTC (permalink / raw)
To: u-boot
"J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 01:48:52:
>
> Joakim Tjernlund wrote:
> > Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
> >
> >
> >> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
> >> <joakim.tjernlund@transmode.se> wrote:
> >>
> >>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
> >>>
> >>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
> >>>> <joakim.tjernlund@transmode.se> wrote:
> >>>>
> >>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
> >>>>>
> >>>> [Massive Snip :)]
> >>>>
> >>>>
> >>>>>> So, all that is left are .dynsym and .dynamic ...
> >>>>>> .dynsym
> >>>>>> - Contains 70 entries (16 bytes each, 1120 bytes)
> >>>>>> - 44 entries mimic those entries in .got which are not relocated
> >>>>>> - 21 entries are the remaining symbols exported from the linker
> >>>>>> script
> >>>>>> - 4 entries are labels defined in inline asm and used in C
> >>>>>>
> >>>>> Try adding proper asm declarations. Look at what gcc
> >>>>> generates for a function/variable and mimic these.
> >>>>>
> >>>> Thanks - Now .dynsym contains only exports from the linker script
> >>>>
> >>> :)
> >>>
> >>>>>> - 1 entry is a NULL entry
> >>>>>>
> >>>>>> .dynamic
> >>>>>> - 88 bytes
> >>>>>> - Array of Elf32_Dyn
> >>>>>> - typedef struct {
> >>>>>> Elf32_Sword d_tag;
> >>>>>> union {
> >>>>>> Elf32_Word d_val;
> >>>>>> Elf32_Addr d_ptr;
> >>>>>> } d_un;
> >>>>>> } Elf32_Dyn;
> >>>>>> - 0x11 entries
> >>>>>> [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
> >>>>>> [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
> >>>>>> [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
> >>>>>> [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
> >>>>>> [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
> >>>>>> [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
> >>>>>> [06] 0x00000015, 0x00000000 DT_DEBUG, ???
> >>>>>> [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
> >>>>>> [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
> >>>>>>
> >>>>> How big DT_REL is
> >>>>>
> >>>>>> [09] 0x00000013, 0x00000008 DT_RELENT, ???
> >>>>>>
> >>>>> hmm, cannot remeber :)
> >>>>>
> >>>> How big an entry in DT_REL is
> >>>>
> >>> Right, how could I forget :)
> >>>
> >>>>>> [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
> >>>>>>
> >>>>> Oops, you got text relocations. This is generally a bad thing.
> >>>>> TEXTREL is commonly caused by asm code that arent truly pic so it needs
> >>>>> to modify the .text segment to adjust for relocation.
> >>>>> You should get rid of this one. Look for DT_TEXTREL in .o files to find
> >>>>> the culprit.
> >>>>>
> >>>>>
> >>>> Alas I cannot - The relocations are a result of loading a register with a
> >>>> return address when calling show_boot_progress in the very early stages of
> >>>> initialisation prior to the stack becoming available. The x86 does not
> >>>> allow direct access to the IP so the only way to find the 'current
> >>>> execution address' is to 'call' to the next instruction and pop the return
> >>>> address off the stack
> >>>>
> >>> hmm, same as ppc but that in it self should not cause a TEXREL, should it?
> >>> Ahh, the 'call' is absolute, not relative? I guess there is some way around it
> >>> but it is not important ATM I guess.
> >>>
> >>> Evil idea, skip -fpic et. all and add the full reloc procedure
> >>> to relocate by rewriting directly in TEXT segment. Then you save space
> >>> but you need more relocation code. Something like dl_do_reloc from
> >>> uClibc. Wonder how much extra code that would be? Not too much I think.
> >>>
> >>>
> >> With the following flags
> >>
> >> PLATFORM_RELFLAGS += -fvisibility=hidden
> >> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
> >> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
> >>
> >> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
> >> this might mean I need the symbol table in the binary in order to resolve
> >> them
> >>
BTW, how many relocs do you get compared with -fPIC? I suspect you more
now but hopefully not that many more.
> >
> > Possibly, but I think you only need to add an offset to all those
> > relocs.
> >
> Almost right. The relocations specify a symbol value that needs to be
> added to the data in memory to relocate the reference. The symbol values
> involved should be the start of the text section for program references,
> the start of the uninitialized data section for bss references, and the
> start of the data section for initialized data and constants. So there
> are about four symbols whose value you need to keep. Take a look at
> http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
> already looked at) and it tells you what to do with R_386_PC32 ad
> R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
> will remove all the symbols you don't actually need, but I don't know
> that for sure. Note also that you can change the section flags of a
> section marked noload to load.
Still think you can get away with just ADDING an offset. The image is linked to a
specific address and then you move the whole image to a new address. Therefore
you should be able to read the current address, add offset, write back the new address.
Normally one do what you describe but here we know that the whole img has moved so
we don't have to do calculate the new address from scratch.
Jocke
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-14 7:25 ` Joakim Tjernlund
@ 2009-10-14 11:48 ` Graeme Russ
2009-10-14 12:38 ` Joakim Tjernlund
2009-10-14 15:35 ` J. William Campbell
1 sibling, 1 reply; 47+ messages in thread
From: Graeme Russ @ 2009-10-14 11:48 UTC (permalink / raw)
To: u-boot
On Wed, Oct 14, 2009 at 6:25 PM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
> "J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 01:48:52:
>>
>> Joakim Tjernlund wrote:
>> > Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
>> >
>> >
>> >> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
>> >> <joakim.tjernlund@transmode.se> wrote:
>> >>
>> >>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
>> >>>
>> >>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
>> >>>> <joakim.tjernlund@transmode.se> wrote:
>> >>>>
>> >>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>> >>>>>
>> >>>> [Massive Snip :)]
>> >>>>
>> >>>>
>> >>>>>> So, all that is left are .dynsym and .dynamic ...
>> >>>>>> .dynsym
>> >>>>>> - Contains 70 entries (16 bytes each, 1120 bytes)
>> >>>>>> - 44 entries mimic those entries in .got which are not relocated
>> >>>>>> - 21 entries are the remaining symbols exported from the linker
>> >>>>>> script
>> >>>>>> - 4 entries are labels defined in inline asm and used in C
>> >>>>>>
>> >>>>> Try adding proper asm declarations. Look at what gcc
>> >>>>> generates for a function/variable and mimic these.
>> >>>>>
>> >>>> Thanks - Now .dynsym contains only exports from the linker script
>> >>>>
>> >>> :)
>> >>>
>> >>>>>> - 1 entry is a NULL entry
>> >>>>>>
>> >>>>>> .dynamic
>> >>>>>> - 88 bytes
>> >>>>>> - Array of Elf32_Dyn
>> >>>>>> - typedef struct {
>> >>>>>> Elf32_Sword d_tag;
>> >>>>>> union {
>> >>>>>> Elf32_Word d_val;
>> >>>>>> Elf32_Addr d_ptr;
>> >>>>>> } d_un;
>> >>>>>> } Elf32_Dyn;
>> >>>>>> - 0x11 entries
>> >>>>>> [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
>> >>>>>> [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
>> >>>>>> [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
>> >>>>>> [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
>> >>>>>> [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
>> >>>>>> [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
>> >>>>>> [06] 0x00000015, 0x00000000 DT_DEBUG, ???
>> >>>>>> [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
>> >>>>>> [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
>> >>>>>>
>> >>>>> How big DT_REL is
>> >>>>>
>> >>>>>> [09] 0x00000013, 0x00000008 DT_RELENT, ???
>> >>>>>>
>> >>>>> hmm, cannot remeber :)
>> >>>>>
>> >>>> How big an entry in DT_REL is
>> >>>>
>> >>> Right, how could I forget :)
>> >>>
>> >>>>>> [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
>> >>>>>>
>> >>>>> Oops, you got text relocations. This is generally a bad thing.
>> >>>>> TEXTREL is commonly caused by asm code that arent truly pic so it needs
>> >>>>> to modify the .text segment to adjust for relocation.
>> >>>>> You should get rid of this one. Look for DT_TEXTREL in .o files to find
>> >>>>> the culprit.
>> >>>>>
>> >>>>>
>> >>>> Alas I cannot - The relocations are a result of loading a register with a
>> >>>> return address when calling show_boot_progress in the very early stages of
>> >>>> initialisation prior to the stack becoming available. The x86 does not
>> >>>> allow direct access to the IP so the only way to find the 'current
>> >>>> execution address' is to 'call' to the next instruction and pop the return
>> >>>> address off the stack
>> >>>>
>> >>> hmm, same as ppc but that in it self should not cause a TEXREL, should it?
>> >>> Ahh, the 'call' is absolute, not relative? I guess there is some way around it
>> >>> but it is not important ATM I guess.
>> >>>
>> >>> Evil idea, skip -fpic et. all and add the full reloc procedure
>> >>> to relocate by rewriting directly in TEXT segment. Then you save space
>> >>> but you need more relocation code. Something like dl_do_reloc from
>> >>> uClibc. Wonder how much extra code that would be? Not too much I think.
>> >>>
>> >>>
>> >> With the following flags
>> >>
>> >> PLATFORM_RELFLAGS += -fvisibility=hidden
>> >> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>> >> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
>> >>
>> >> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
>> >> this might mean I need the symbol table in the binary in order to resolve
>> >> them
>> >>
>
> BTW, how many relocs do you get compared with -fPIC? I suspect you more
> now but hopefully not that many more.
>
>> >
>> > Possibly, but I think you only need to add an offset to all those
>> > relocs.
>> >
>> Almost right. The relocations specify a symbol value that needs to be
>> added to the data in memory to relocate the reference. The symbol values
>> involved should be the start of the text section for program references,
>> the start of the uninitialized data section for bss references, and the
>> start of the data section for initialized data and constants. So there
>> are about four symbols whose value you need to keep. Take a look at
>> http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
>> already looked at) and it tells you what to do with R_386_PC32 ad
>> R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
>> will remove all the symbols you don't actually need, but I don't know
>> that for sure. Note also that you can change the section flags of a
>> section marked noload to load.
>
> Still think you can get away with just ADDING an offset. The image is linked to a
> specific address and then you move the whole image to a new address. Therefore
> you should be able to read the current address, add offset, write back the new address.
>
OK, I don't really get this at all....
This code:
printf ("\n\n%s\n\n", version_string);
gets compiled into:
380403e7: 68 a4 18 05 38 push $0x380518a4
380403ec: 68 de 2c 05 38 push $0x38052cde
380403f1: e8 4f 84 00 00 call 38048845 <printf>
With relocation entries in .rel.text of:
Offset Info Type Sym.Value Sym. Name
380403e8 00016201 R_386_32 380519f0 version_string
380403ed 00000201 R_386_32 380519f0 .rodata
380403f2 00016b02 R_386_PC32 38048991 printf
Now I get the first two (R_386_32) entries - Relocation involves a simple
addition of an offset to the values at addresses 0x380403e8 and 0x380403ed
(of course, these addresses will be offset)
However, the R_386_PC32 is an enigma - The call is already relative -
there is no need to relocate it at all (call is a position independent
opcode because it is a relative jump!)
Will all R_386_PC32 be like this? Can I simply ignore them all? If so, why
do they even need to be generated?
Hmmm
Graeme
> Normally one do what you describe but here we know that the whole img has moved so
> we don't have to do calculate the new address from scratch.
>
> Jocke
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-14 11:48 ` Graeme Russ
@ 2009-10-14 12:38 ` Joakim Tjernlund
2009-10-14 16:45 ` J. William Campbell
0 siblings, 1 reply; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-14 12:38 UTC (permalink / raw)
To: u-boot
Graeme Russ <graeme.russ@gmail.com> wrote on 14/10/2009 13:48:27:
>
> On Wed, Oct 14, 2009 at 6:25 PM, Joakim Tjernlund
> <joakim.tjernlund@transmode.se> wrote:
> > "J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 01:48:52:
> >>
> >> Joakim Tjernlund wrote:
> >> > Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
> >> >
> >> >
> >> >> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
> >> >> <joakim.tjernlund@transmode.se> wrote:
> >> >>
> >> >>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
> >> >>>
> >> >>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
> >> >>>> <joakim.tjernlund@transmode.se> wrote:
> >> >>>>
> >> >>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
> >> >>>>>
> >> >>>> [Massive Snip :)]
> >> >>>>
> >> >>>>
> >> >>>>>> So, all that is left are .dynsym and .dynamic ...
> >> >>>>>> .dynsym
> >> >>>>>> - Contains 70 entries (16 bytes each, 1120 bytes)
> >> >>>>>> - 44 entries mimic those entries in .got which are not relocated
> >> >>>>>> - 21 entries are the remaining symbols exported from the linker
> >> >>>>>> script
> >> >>>>>> - 4 entries are labels defined in inline asm and used in C
> >> >>>>>>
> >> >>>>> Try adding proper asm declarations. Look at what gcc
> >> >>>>> generates for a function/variable and mimic these.
> >> >>>>>
> >> >>>> Thanks - Now .dynsym contains only exports from the linker script
> >> >>>>
> >> >>> :)
> >> >>>
> >> >>>>>> - 1 entry is a NULL entry
> >> >>>>>>
> >> >>>>>> .dynamic
> >> >>>>>> - 88 bytes
> >> >>>>>> - Array of Elf32_Dyn
> >> >>>>>> - typedef struct {
> >> >>>>>> Elf32_Sword d_tag;
> >> >>>>>> union {
> >> >>>>>> Elf32_Word d_val;
> >> >>>>>> Elf32_Addr d_ptr;
> >> >>>>>> } d_un;
> >> >>>>>> } Elf32_Dyn;
> >> >>>>>> - 0x11 entries
> >> >>>>>> [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
> >> >>>>>> [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
> >> >>>>>> [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
> >> >>>>>> [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
> >> >>>>>> [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
> >> >>>>>> [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
> >> >>>>>> [06] 0x00000015, 0x00000000 DT_DEBUG, ???
> >> >>>>>> [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
> >> >>>>>> [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
> >> >>>>>>
> >> >>>>> How big DT_REL is
> >> >>>>>
> >> >>>>>> [09] 0x00000013, 0x00000008 DT_RELENT, ???
> >> >>>>>>
> >> >>>>> hmm, cannot remeber :)
> >> >>>>>
> >> >>>> How big an entry in DT_REL is
> >> >>>>
> >> >>> Right, how could I forget :)
> >> >>>
> >> >>>>>> [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
> >> >>>>>>
> >> >>>>> Oops, you got text relocations. This is generally a bad thing.
> >> >>>>> TEXTREL is commonly caused by asm code that arent truly pic so it needs
> >> >>>>> to modify the .text segment to adjust for relocation.
> >> >>>>> You should get rid of this one. Look for DT_TEXTREL in .o files to find
> >> >>>>> the culprit.
> >> >>>>>
> >> >>>>>
> >> >>>> Alas I cannot - The relocations are a result of loading a register with a
> >> >>>> return address when calling show_boot_progress in the very early stages of
> >> >>>> initialisation prior to the stack becoming available. The x86 does not
> >> >>>> allow direct access to the IP so the only way to find the 'current
> >> >>>> execution address' is to 'call' to the next instruction and pop the return
> >> >>>> address off the stack
> >> >>>>
> >> >>> hmm, same as ppc but that in it self should not cause a TEXREL, should it?
> >> >>> Ahh, the 'call' is absolute, not relative? I guess there is some way around it
> >> >>> but it is not important ATM I guess.
> >> >>>
> >> >>> Evil idea, skip -fpic et. all and add the full reloc procedure
> >> >>> to relocate by rewriting directly in TEXT segment. Then you save space
> >> >>> but you need more relocation code. Something like dl_do_reloc from
> >> >>> uClibc. Wonder how much extra code that would be? Not too much I think.
> >> >>>
> >> >>>
> >> >> With the following flags
> >> >>
> >> >> PLATFORM_RELFLAGS += -fvisibility=hidden
> >> >> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
> >> >> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
> >> >>
> >> >> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
> >> >> this might mean I need the symbol table in the binary in order to resolve
> >> >> them
> >> >>
> >
> > BTW, how many relocs do you get compared with -fPIC? I suspect you more
> > now but hopefully not that many more.
> >
> >> >
> >> > Possibly, but I think you only need to add an offset to all those
> >> > relocs.
> >> >
> >> Almost right. The relocations specify a symbol value that needs to be
> >> added to the data in memory to relocate the reference. The symbol values
> >> involved should be the start of the text section for program references,
> >> the start of the uninitialized data section for bss references, and the
> >> start of the data section for initialized data and constants. So there
> >> are about four symbols whose value you need to keep. Take a look at
> >> http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
> >> already looked at) and it tells you what to do with R_386_PC32 ad
> >> R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
> >> will remove all the symbols you don't actually need, but I don't know
> >> that for sure. Note also that you can change the section flags of a
> >> section marked noload to load.
> >
> > Still think you can get away with just ADDING an offset. The image is linked to a
> > specific address and then you move the whole image to a new address. Therefore
> > you should be able to read the current address, add offset, write back the
> new address.
> >
>
> OK, I don't really get this at all....
>
> This code:
>
> printf ("\n\n%s\n\n", version_string);
>
> gets compiled into:
>
> 380403e7: 68 a4 18 05 38 push $0x380518a4
> 380403ec: 68 de 2c 05 38 push $0x38052cde
> 380403f1: e8 4f 84 00 00 call 38048845 <printf>
>
> With relocation entries in .rel.text of:
>
> Offset Info Type Sym.Value Sym. Name
> 380403e8 00016201 R_386_32 380519f0 version_string
> 380403ed 00000201 R_386_32 380519f0 .rodata
> 380403f2 00016b02 R_386_PC32 38048991 printf
>
> Now I get the first two (R_386_32) entries - Relocation involves a simple
> addition of an offset to the values at addresses 0x380403e8 and 0x380403ed
> (of course, these addresses will be offset)
>
> However, the R_386_PC32 is an enigma - The call is already relative -
> there is no need to relocate it at all (call is a position independent
> opcode because it is a relative jump!)
Yes, but printf is defined in glibc s? the app needs to relocate the call
to glibc. U-boot has all it needs so there you should not have PC32 I think.
Try defining a local static function. For non static functions
you may need to define visibility=hidden and/or -Bsymbolic too.
You also need to look at the img after final linking.
>
> Will all R_386_PC32 be like this? Can I simply ignore them all? If so, why
> do they even need to be generated?
Hopefully you won't have any. Not sure about weak functions though. These might
need PC32 relocs in some cases.
Also, if you look at _dl_do_reloc() in uClibc/ldso/ldso/i386/elfinterp.c I think
you can replace symbol_addr with relocation offset.
Jocke
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-14 7:25 ` Joakim Tjernlund
2009-10-14 11:48 ` Graeme Russ
@ 2009-10-14 15:35 ` J. William Campbell
2009-10-14 16:05 ` Joakim Tjernlund
1 sibling, 1 reply; 47+ messages in thread
From: J. William Campbell @ 2009-10-14 15:35 UTC (permalink / raw)
To: u-boot
Joakim Tjernlund wrote:
> "J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 01:48:52:
>
>> Joakim Tjernlund wrote:
>>
>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
>>>
>>>
>>>
>>>> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>
>>>>
>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
>>>>>
>>>>>
>>>>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>
>>>>>>
>>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>>>>>>>
>>>>>>>
>>>>>> [Massive Snip :)]
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> So, all that is left are .dynsym and .dynamic ...
>>>>>>>> .dynsym
>>>>>>>> - Contains 70 entries (16 bytes each, 1120 bytes)
>>>>>>>> - 44 entries mimic those entries in .got which are not relocated
>>>>>>>> - 21 entries are the remaining symbols exported from the linker
>>>>>>>> script
>>>>>>>> - 4 entries are labels defined in inline asm and used in C
>>>>>>>>
>>>>>>>>
>>>>>>> Try adding proper asm declarations. Look at what gcc
>>>>>>> generates for a function/variable and mimic these.
>>>>>>>
>>>>>>>
>>>>>> Thanks - Now .dynsym contains only exports from the linker script
>>>>>>
>>>>>>
>>>>> :)
>>>>>
>>>>>
>>>>>>>> - 1 entry is a NULL entry
>>>>>>>>
>>>>>>>> .dynamic
>>>>>>>> - 88 bytes
>>>>>>>> - Array of Elf32_Dyn
>>>>>>>> - typedef struct {
>>>>>>>> Elf32_Sword d_tag;
>>>>>>>> union {
>>>>>>>> Elf32_Word d_val;
>>>>>>>> Elf32_Addr d_ptr;
>>>>>>>> } d_un;
>>>>>>>> } Elf32_Dyn;
>>>>>>>> - 0x11 entries
>>>>>>>> [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
>>>>>>>> [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
>>>>>>>> [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
>>>>>>>> [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
>>>>>>>> [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
>>>>>>>> [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
>>>>>>>> [06] 0x00000015, 0x00000000 DT_DEBUG, ???
>>>>>>>> [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
>>>>>>>> [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
>>>>>>>>
>>>>>>>>
>>>>>>> How big DT_REL is
>>>>>>>
>>>>>>>
>>>>>>>> [09] 0x00000013, 0x00000008 DT_RELENT, ???
>>>>>>>>
>>>>>>>>
>>>>>>> hmm, cannot remeber :)
>>>>>>>
>>>>>>>
>>>>>> How big an entry in DT_REL is
>>>>>>
>>>>>>
>>>>> Right, how could I forget :)
>>>>>
>>>>>
>>>>>>>> [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
>>>>>>>>
>>>>>>>>
>>>>>>> Oops, you got text relocations. This is generally a bad thing.
>>>>>>> TEXTREL is commonly caused by asm code that arent truly pic so it needs
>>>>>>> to modify the .text segment to adjust for relocation.
>>>>>>> You should get rid of this one. Look for DT_TEXTREL in .o files to find
>>>>>>> the culprit.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Alas I cannot - The relocations are a result of loading a register with a
>>>>>> return address when calling show_boot_progress in the very early stages of
>>>>>> initialisation prior to the stack becoming available. The x86 does not
>>>>>> allow direct access to the IP so the only way to find the 'current
>>>>>> execution address' is to 'call' to the next instruction and pop the return
>>>>>> address off the stack
>>>>>>
>>>>>>
>>>>> hmm, same as ppc but that in it self should not cause a TEXREL, should it?
>>>>> Ahh, the 'call' is absolute, not relative? I guess there is some way around it
>>>>> but it is not important ATM I guess.
>>>>>
>>>>> Evil idea, skip -fpic et. all and add the full reloc procedure
>>>>> to relocate by rewriting directly in TEXT segment. Then you save space
>>>>> but you need more relocation code. Something like dl_do_reloc from
>>>>> uClibc. Wonder how much extra code that would be? Not too much I think.
>>>>>
>>>>>
>>>>>
>>>> With the following flags
>>>>
>>>> PLATFORM_RELFLAGS += -fvisibility=hidden
>>>> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>>>> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
>>>>
>>>> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
>>>> this might mean I need the symbol table in the binary in order to resolve
>>>> them
>>>>
>>>>
>
> BTW, how many relocs do you get compared with -fPIC? I suspect you more
> now but hopefully not that many more.
>
>
>>> Possibly, but I think you only need to add an offset to all those
>>> relocs.
>>>
>>>
>> Almost right. The relocations specify a symbol value that needs to be
>> added to the data in memory to relocate the reference. The symbol values
>> involved should be the start of the text section for program references,
>> the start of the uninitialized data section for bss references, and the
>> start of the data section for initialized data and constants. So there
>> are about four symbols whose value you need to keep. Take a look at
>> http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
>> already looked at) and it tells you what to do with R_386_PC32 ad
>> R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
>> will remove all the symbols you don't actually need, but I don't know
>> that for sure. Note also that you can change the section flags of a
>> section marked noload to load.
>>
>
> Still think you can get away with just ADDING an offset. The image is linked to a
> specific address and then you move the whole image to a new address. Therefore
> you should be able to read the current address, add offset, write back the new address.
>
> Normally one do what you describe but here we know that the whole img has moved so
> we don't have to do calculate the new address from scratch.
>
If the addresses of the bss, text, and data segments change by the same
value, I think you are correct. However, if the text and data/bss
segments are moved by different offsets, naturally the relocations would
be different. One reason to retain this capability would be to allow the
u-boot copy to execute in place in NOR flash while re-locating the
read-write storage once memory has been sized. Having different
relocation factors is not much worse than just one, and it may be just
as easy to get working initially as a single relocation constant.
FWIW, the "ultimate" solution to minimum relocation size is a
post-processing step that creates "several" arrays of relocation offsets
as two byte quantities. This reduces the cost of each relocation entry
to just a bit more than two bytes (there is a small overhead for array
size, MSB values and relocation offset selection.) Naturally, this is
much less than the ELF version of the same relocations, because we do
not need to retain as much information and ELF doesn't worry about size
that much.. This may pacify users for which the flash size of the image
is critical, at the expense of an extra link step. Naturally, getting
things to work with "standard ELF" is the most important step, and
probably enough for most people.
I also am interested in the number of additional relocations generated
without -fpic. I suspect on the 386 it can be substantial. However, for
every new reloc generated, a .got reference load will probably be
eliminated. This should result in a shorter text segment to balance the
increased relocation segment. Adding the -fno-jump-tables gcc option may
also help a bit.
Bill Campbell
> Jocke
>
>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-14 15:35 ` J. William Campbell
@ 2009-10-14 16:05 ` Joakim Tjernlund
2009-10-14 16:49 ` J. William Campbell
0 siblings, 1 reply; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-14 16:05 UTC (permalink / raw)
To: u-boot
"J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 17:35:44:
>
> Joakim Tjernlund wrote:
> > "J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 01:48:52:
> >
> >> Joakim Tjernlund wrote:
> >>
> >>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
> >>>
> >>>
> >>>
> >>>> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
> >>>> <joakim.tjernlund@transmode.se> wrote:
> >>>>
> >>>>
> >>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
> >>>>>
> >>>>>
> >>>>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
> >>>>>> <joakim.tjernlund@transmode.se> wrote:
> >>>>>>
> >>>>>>
> >>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
> >>>>>>>
> >>>>>>>
> >>>>>> [Massive Snip :)]
[Yet another SNIP :)]
> >>>>> Evil idea, skip -fpic et. all and add the full reloc procedure
> >>>>> to relocate by rewriting directly in TEXT segment. Then you save space
> >>>>> but you need more relocation code. Something like dl_do_reloc from
> >>>>> uClibc. Wonder how much extra code that would be? Not too much I think.
> >>>>>
> >>>>>
> >>>>>
> >>>> With the following flags
> >>>>
> >>>> PLATFORM_RELFLAGS += -fvisibility=hidden
> >>>> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
> >>>> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
> >>>>
> >>>> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
> >>>> this might mean I need the symbol table in the binary in order to resolve
> >>>> them
> >>>>
> >>>>
> >
> > BTW, how many relocs do you get compared with -fPIC? I suspect you more
> > now but hopefully not that many more.
> >
> >
> >>> Possibly, but I think you only need to add an offset to all those
> >>> relocs.
> >>>
> >>>
> >> Almost right. The relocations specify a symbol value that needs to be
> >> added to the data in memory to relocate the reference. The symbol values
> >> involved should be the start of the text section for program references,
> >> the start of the uninitialized data section for bss references, and the
> >> start of the data section for initialized data and constants. So there
> >> are about four symbols whose value you need to keep. Take a look at
> >> http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
> >> already looked at) and it tells you what to do with R_386_PC32 ad
> >> R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
> >> will remove all the symbols you don't actually need, but I don't know
> >> that for sure. Note also that you can change the section flags of a
> >> section marked noload to load.
> >>
> >
> > Still think you can get away with just ADDING an offset. The image is linked to a
> > specific address and then you move the whole image to a new address. Therefore
> > you should be able to read the current address, add offset, write back the
> new address.
> >
> > Normally one do what you describe but here we know that the whole img has moved so
> > we don't have to do calculate the new address from scratch.
> >
> If the addresses of the bss, text, and data segments change by the same
> value, I think you are correct. However, if the text and data/bss
> segments are moved by different offsets, naturally the relocations would
> be different. One reason to retain this capability would be to allow the
> u-boot copy to execute in place in NOR flash while re-locating the
> read-write storage once memory has been sized. Having different
> relocation factors is not much worse than just one, and it may be just
> as easy to get working initially as a single relocation constant.
How do figure that? You need to rewrite the insn to access the moved
data/bss and they are in flash, did I miss something?
>
> FWIW, the "ultimate" solution to minimum relocation size is a
> post-processing step that creates "several" arrays of relocation offsets
> as two byte quantities. This reduces the cost of each relocation entry
> to just a bit more than two bytes (there is a small overhead for array
> size, MSB values and relocation offset selection.) Naturally, this is
> much less than the ELF version of the same relocations, because we do
> not need to retain as much information and ELF doesn't worry about size
> that much.. This may pacify users for which the flash size of the image
> is critical, at the expense of an extra link step. Naturally, getting
> things to work with "standard ELF" is the most important step, and
> probably enough for most people.
That would save 2+4 bytes/reloc on REL arches and
2+4+4 on RELA(ppc) (provided one can ignore r_addend)
But yes, this is probably too "fancy" for the moment.
Jocke
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-14 12:38 ` Joakim Tjernlund
@ 2009-10-14 16:45 ` J. William Campbell
2009-10-17 5:17 ` Graeme Russ
0 siblings, 1 reply; 47+ messages in thread
From: J. William Campbell @ 2009-10-14 16:45 UTC (permalink / raw)
To: u-boot
Joakim Tjernlund wrote:
> Graeme Russ <graeme.russ@gmail.com> wrote on 14/10/2009 13:48:27:
>
>> On Wed, Oct 14, 2009 at 6:25 PM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>>
>>> "J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 01:48:52:
>>>
>>>> Joakim Tjernlund wrote:
>>>>
>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
>>>>>
>>>>>
>>>>>
>>>>>> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>
>>>>>>
>>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
>>>>>>>
>>>>>>>
>>>>>>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
>>>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> [Massive Snip :)]
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>> So, all that is left are .dynsym and .dynamic ...
>>>>>>>>>> .dynsym
>>>>>>>>>> - Contains 70 entries (16 bytes each, 1120 bytes)
>>>>>>>>>> - 44 entries mimic those entries in .got which are not relocated
>>>>>>>>>> - 21 entries are the remaining symbols exported from the linker
>>>>>>>>>> script
>>>>>>>>>> - 4 entries are labels defined in inline asm and used in C
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Try adding proper asm declarations. Look at what gcc
>>>>>>>>> generates for a function/variable and mimic these.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Thanks - Now .dynsym contains only exports from the linker script
>>>>>>>>
>>>>>>>>
>>>>>>> :)
>>>>>>>
>>>>>>>
>>>>>>>>>> - 1 entry is a NULL entry
>>>>>>>>>>
>>>>>>>>>> .dynamic
>>>>>>>>>> - 88 bytes
>>>>>>>>>> - Array of Elf32_Dyn
>>>>>>>>>> - typedef struct {
>>>>>>>>>> Elf32_Sword d_tag;
>>>>>>>>>> union {
>>>>>>>>>> Elf32_Word d_val;
>>>>>>>>>> Elf32_Addr d_ptr;
>>>>>>>>>> } d_un;
>>>>>>>>>> } Elf32_Dyn;
>>>>>>>>>> - 0x11 entries
>>>>>>>>>> [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
>>>>>>>>>> [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
>>>>>>>>>> [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
>>>>>>>>>> [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
>>>>>>>>>> [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
>>>>>>>>>> [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
>>>>>>>>>> [06] 0x00000015, 0x00000000 DT_DEBUG, ???
>>>>>>>>>> [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
>>>>>>>>>> [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> How big DT_REL is
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> [09] 0x00000013, 0x00000008 DT_RELENT, ???
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> hmm, cannot remeber :)
>>>>>>>>>
>>>>>>>>>
>>>>>>>> How big an entry in DT_REL is
>>>>>>>>
>>>>>>>>
>>>>>>> Right, how could I forget :)
>>>>>>>
>>>>>>>
>>>>>>>>>> [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Oops, you got text relocations. This is generally a bad thing.
>>>>>>>>> TEXTREL is commonly caused by asm code that arent truly pic so it needs
>>>>>>>>> to modify the .text segment to adjust for relocation.
>>>>>>>>> You should get rid of this one. Look for DT_TEXTREL in .o files to find
>>>>>>>>> the culprit.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Alas I cannot - The relocations are a result of loading a register with a
>>>>>>>> return address when calling show_boot_progress in the very early stages of
>>>>>>>> initialisation prior to the stack becoming available. The x86 does not
>>>>>>>> allow direct access to the IP so the only way to find the 'current
>>>>>>>> execution address' is to 'call' to the next instruction and pop the return
>>>>>>>> address off the stack
>>>>>>>>
>>>>>>>>
>>>>>>> hmm, same as ppc but that in it self should not cause a TEXREL, should it?
>>>>>>> Ahh, the 'call' is absolute, not relative? I guess there is some way around it
>>>>>>> but it is not important ATM I guess.
>>>>>>>
>>>>>>> Evil idea, skip -fpic et. all and add the full reloc procedure
>>>>>>> to relocate by rewriting directly in TEXT segment. Then you save space
>>>>>>> but you need more relocation code. Something like dl_do_reloc from
>>>>>>> uClibc. Wonder how much extra code that would be? Not too much I think.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> With the following flags
>>>>>>
>>>>>> PLATFORM_RELFLAGS += -fvisibility=hidden
>>>>>> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>>>>>> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
>>>>>>
>>>>>> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
>>>>>> this might mean I need the symbol table in the binary in order to resolve
>>>>>> them
>>>>>>
>>>>>>
>>> BTW, how many relocs do you get compared with -fPIC? I suspect you more
>>> now but hopefully not that many more.
>>>
>>>
>>>>> Possibly, but I think you only need to add an offset to all those
>>>>> relocs.
>>>>>
>>>>>
>>>> Almost right. The relocations specify a symbol value that needs to be
>>>> added to the data in memory to relocate the reference. The symbol values
>>>> involved should be the start of the text section for program references,
>>>> the start of the uninitialized data section for bss references, and the
>>>> start of the data section for initialized data and constants. So there
>>>> are about four symbols whose value you need to keep. Take a look at
>>>> http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
>>>> already looked at) and it tells you what to do with R_386_PC32 ad
>>>> R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
>>>> will remove all the symbols you don't actually need, but I don't know
>>>> that for sure. Note also that you can change the section flags of a
>>>> section marked noload to load.
>>>>
>>> Still think you can get away with just ADDING an offset. The image is linked to a
>>> specific address and then you move the whole image to a new address. Therefore
>>> you should be able to read the current address, add offset, write back the
>>>
>> new address.
>>
>> OK, I don't really get this at all....
>>
>> This code:
>>
>> printf ("\n\n%s\n\n", version_string);
>>
>> gets compiled into:
>>
>> 380403e7: 68 a4 18 05 38 push $0x380518a4
>> 380403ec: 68 de 2c 05 38 push $0x38052cde
>> 380403f1: e8 4f 84 00 00 call 38048845 <printf>
>>
>> With relocation entries in .rel.text of:
>>
>> Offset Info Type Sym.Value Sym. Name
>> 380403e8 00016201 R_386_32 380519f0 version_string
>> 380403ed 00000201 R_386_32 380519f0 .rodata
>> 380403f2 00016b02 R_386_PC32 38048991 printf
>>
>> Now I get the first two (R_386_32) entries - Relocation involves a simple
>> addition of an offset to the values at addresses 0x380403e8 and 0x380403ed
>> (of course, these addresses will be offset)
>>
>> However, the R_386_PC32 is an enigma - The call is already relative -
>> there is no need to relocate it at all (call is a position independent
>> opcode because it is a relative jump!)
>>
>
> Yes, but printf is defined in glibc s? the app needs to relocate the call
> to glibc.
Actually, the reason the call is relocatable is that the compiler
DOESN'T KNOW where printf is at all. If it is in a library, it will not
be in the text segment and must be relocated accordingly. It may be in
a different segment for some reason. In any case, the compiler doesn't
know the address in the image where printf resides, so it needs a
relocation entry to get the value filled in at link time. After the
value is filled in, if the referenced symbol is in the same segment
(probably .text) as the point of reference, the relocation reference is
probably of no more use. However, there is no rule that says the linker
must delete the reference from the relocation list.
> U-boot has all it needs so there you should not have PC32 I think.
> Try defining a local static function. For non static functions
> you may need to define visibility=hidden and/or -Bsymbolic too.
>
Won't help. Any symbols referenced but not defined locally are
relocatable. After linking, they MAY, but need not, go away.
> You also need to look at the img after final linking.
>
After linking, if the symbol is defined, the R_386_PC32 is no longer
important UNLESS the symbol referenced is in a different segment AND the
segments are relocated with different offsets from each other than
originally linked. For this reason, I think the linker will not discard
these relocations. If we are not relocating the segments with different
relative offsets, we can ignore these relocations as the change in
offset will come out to be zero anyway. However, if you process them
normally, you will just add 0 and nothing will change.
>
>> Will all R_386_PC32 be like this? Can I simply ignore them all? If so, why
>> do they even need to be generated?
>>
>
> Hopefully you won't have any.
I think they may still be there, because we ask the linker to preserve
relocation information. However, if the entire image is being relocated,
not changing the order or relative offset of any segments, they can be
ignored, because the relative values will not change. It will be
interesting to know if they remain or if the linker drops them out. For
references in the same segment, we can hope that they get dropped. For
references across segments (if any), or any undefined symbols, they will
remain.
> Not sure about weak functions though. These might
> need PC32 relocs in some cases.
>
There can be PC32 relocs referencing the weak symbol, but that symbol
may be undefined.
> Also, if you look at _dl_do_reloc() in uClibc/ldso/ldso/i386/elfinterp.c I think
> you can replace symbol_addr with relocation offset.
>
I agree, in the case you a moving the entire image and ignoring PC32 relocs.
Best Regards,
Bill Campbell
> Jocke
>
>
>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-14 16:05 ` Joakim Tjernlund
@ 2009-10-14 16:49 ` J. William Campbell
0 siblings, 0 replies; 47+ messages in thread
From: J. William Campbell @ 2009-10-14 16:49 UTC (permalink / raw)
To: u-boot
Joakim Tjernlund wrote:
> "J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 17:35:44:
>
>> Joakim Tjernlund wrote:
>>
>>> "J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 01:48:52:
>>>
>>>
>>>> Joakim Tjernlund wrote:
>>>>
>>>>
>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
>>>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> [Massive Snip :)]
>>>>>>>>
>
> [Yet another SNIP :)]
>
>
>>>>>>> Evil idea, skip -fpic et. all and add the full reloc procedure
>>>>>>> to relocate by rewriting directly in TEXT segment. Then you save space
>>>>>>> but you need more relocation code. Something like dl_do_reloc from
>>>>>>> uClibc. Wonder how much extra code that would be? Not too much I think.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> With the following flags
>>>>>>
>>>>>> PLATFORM_RELFLAGS += -fvisibility=hidden
>>>>>> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>>>>>> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
>>>>>>
>>>>>> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
>>>>>> this might mean I need the symbol table in the binary in order to resolve
>>>>>> them
>>>>>>
>>>>>>
>>>>>>
>>> BTW, how many relocs do you get compared with -fPIC? I suspect you more
>>> now but hopefully not that many more.
>>>
>>>
>>>
>>>>> Possibly, but I think you only need to add an offset to all those
>>>>> relocs.
>>>>>
>>>>>
>>>>>
>>>> Almost right. The relocations specify a symbol value that needs to be
>>>> added to the data in memory to relocate the reference. The symbol values
>>>> involved should be the start of the text section for program references,
>>>> the start of the uninitialized data section for bss references, and the
>>>> start of the data section for initialized data and constants. So there
>>>> are about four symbols whose value you need to keep. Take a look at
>>>> http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
>>>> already looked at) and it tells you what to do with R_386_PC32 ad
>>>> R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
>>>> will remove all the symbols you don't actually need, but I don't know
>>>> that for sure. Note also that you can change the section flags of a
>>>> section marked noload to load.
>>>>
>>>>
>>> Still think you can get away with just ADDING an offset. The image is linked to a
>>> specific address and then you move the whole image to a new address. Therefore
>>> you should be able to read the current address, add offset, write back the
>>>
>> new address.
>>
>>> Normally one do what you describe but here we know that the whole img has moved so
>>> we don't have to do calculate the new address from scratch.
>>>
>>>
>> If the addresses of the bss, text, and data segments change by the same
>> value, I think you are correct. However, if the text and data/bss
>> segments are moved by different offsets, naturally the relocations would
>> be different. One reason to retain this capability would be to allow the
>> u-boot copy to execute in place in NOR flash while re-locating the
>> read-write storage once memory has been sized. Having different
>> relocation factors is not much worse than just one, and it may be just
>> as easy to get working initially as a single relocation constant.
>>
>
> How do figure that? You need to rewrite the insn to access the moved
> data/bss and they are in flash, did I miss something?
>
No, I did. You are quite correct, there would be references in flash
that couldn't be fixed. Sorry about that.
Best Regards,
Bill Campbell
>
>> FWIW, the "ultimate" solution to minimum relocation size is a
>> post-processing step that creates "several" arrays of relocation offsets
>> as two byte quantities. This reduces the cost of each relocation entry
>> to just a bit more than two bytes (there is a small overhead for array
>> size, MSB values and relocation offset selection.) Naturally, this is
>> much less than the ELF version of the same relocations, because we do
>> not need to retain as much information and ELF doesn't worry about size
>> that much.. This may pacify users for which the flash size of the image
>> is critical, at the expense of an extra link step. Naturally, getting
>> things to work with "standard ELF" is the most important step, and
>> probably enough for most people.
>>
>
> That would save 2+4 bytes/reloc on REL arches and
> 2+4+4 on RELA(ppc) (provided one can ignore r_addend)
>
> But yes, this is probably too "fancy" for the moment.
>
> Jocke
>
>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-14 16:45 ` J. William Campbell
@ 2009-10-17 5:17 ` Graeme Russ
2009-10-17 12:32 ` Joakim Tjernlund
2009-10-17 12:59 ` J. William Campbell
0 siblings, 2 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-17 5:17 UTC (permalink / raw)
To: u-boot
On Thu, Oct 15, 2009 at 3:45 AM, J. William Campbell
<jwilliamcampbell@comcast.net> wrote:
> Joakim Tjernlund wrote:
>>
>> Graeme Russ <graeme.russ@gmail.com> wrote on 14/10/2009 13:48:27:
>>
>>>
>>> On Wed, Oct 14, 2009 at 6:25 PM, Joakim Tjernlund
>>> <joakim.tjernlund@transmode.se> wrote:
>>>
>>>>
>>>> "J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009
>>>> 01:48:52:
>>>>
>>>>>
>>>>> Joakim Tjernlund wrote:
>>>>>
>>>>>>
>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
>>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
>>>>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [Massive Snip :)]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So, all that is left are .dynsym and .dynamic ...
>>>>>>>>>>> .dynsym
>>>>>>>>>>> - Contains 70 entries (16 bytes each, 1120 bytes)
>>>>>>>>>>> - 44 entries mimic those entries in .got which are not
>>>>>>>>>>> relocated
>>>>>>>>>>> - 21 entries are the remaining symbols exported from the
>>>>>>>>>>> linker
>>>>>>>>>>> script
>>>>>>>>>>> - 4 entries are labels defined in inline asm and used in C
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Try adding proper asm declarations. Look at what gcc
>>>>>>>>>> generates for a function/variable and mimic these.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks - Now .dynsym contains only exports from the linker script
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> :)
>>>>>>>>
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> - 1 entry is a NULL entry
>>>>>>>>>>>
>>>>>>>>>>> .dynamic
>>>>>>>>>>> - 88 bytes
>>>>>>>>>>> - Array of Elf32_Dyn
>>>>>>>>>>> - typedef struct {
>>>>>>>>>>> Elf32_Sword d_tag;
>>>>>>>>>>> union {
>>>>>>>>>>> Elf32_Word d_val;
>>>>>>>>>>> Elf32_Addr d_ptr;
>>>>>>>>>>> } d_un;
>>>>>>>>>>> } Elf32_Dyn;
>>>>>>>>>>> - 0x11 entries
>>>>>>>>>>> [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
>>>>>>>>>>> [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
>>>>>>>>>>> [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
>>>>>>>>>>> [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
>>>>>>>>>>> [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
>>>>>>>>>>> [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
>>>>>>>>>>> [06] 0x00000015, 0x00000000 DT_DEBUG, ???
>>>>>>>>>>> [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
>>>>>>>>>>> [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> How big DT_REL is
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [09] 0x00000013, 0x00000008 DT_RELENT, ???
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> hmm, cannot remeber :)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> How big an entry in DT_REL is
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Right, how could I forget :)
>>>>>>>>
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Oops, you got text relocations. This is generally a bad thing.
>>>>>>>>>> TEXTREL is commonly caused by asm code that arent truly pic so it
>>>>>>>>>> needs
>>>>>>>>>> to modify the .text segment to adjust for relocation.
>>>>>>>>>> You should get rid of this one. Look for DT_TEXTREL in .o files to
>>>>>>>>>> find
>>>>>>>>>> the culprit.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Alas I cannot - The relocations are a result of loading a register
>>>>>>>>> with a
>>>>>>>>> return address when calling show_boot_progress in the very early
>>>>>>>>> stages of
>>>>>>>>> initialisation prior to the stack becoming available. The x86 does
>>>>>>>>> not
>>>>>>>>> allow direct access to the IP so the only way to find the 'current
>>>>>>>>> execution address' is to 'call' to the next instruction and pop the
>>>>>>>>> return
>>>>>>>>> address off the stack
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> hmm, same as ppc but that in it self should not cause a TEXREL,
>>>>>>>> should it?
>>>>>>>> Ahh, the 'call' is absolute, not relative? I guess there is some way
>>>>>>>> around it
>>>>>>>> but it is not important ATM I guess.
>>>>>>>>
>>>>>>>> Evil idea, skip -fpic et. all and add the full reloc procedure
>>>>>>>> to relocate by rewriting directly in TEXT segment. Then you save
>>>>>>>> space
>>>>>>>> but you need more relocation code. Something like dl_do_reloc from
>>>>>>>> uClibc. Wonder how much extra code that would be? Not too much I
>>>>>>>> think.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> With the following flags
>>>>>>>
>>>>>>> PLATFORM_RELFLAGS += -fvisibility=hidden
>>>>>>> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>>>>>>> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic
>>>>>>> -Bsymbolic-functions
>>>>>>>
>>>>>>> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I
>>>>>>> think
>>>>>>> this might mean I need the symbol table in the binary in order to
>>>>>>> resolve
>>>>>>> them
>>>>>>>
>>>>>>>
>>>>
>>>> BTW, how many relocs do you get compared with -fPIC? I suspect you more
>>>> now but hopefully not that many more.
>>>>
>>>>
>>>>>>
>>>>>> Possibly, but I think you only need to add an offset to all those
>>>>>> relocs.
>>>>>>
>>>>>>
>>>>>
>>>>> Almost right. The relocations specify a symbol value that needs to be
>>>>> added to the data in memory to relocate the reference. The symbol
>>>>> values
>>>>> involved should be the start of the text section for program
>>>>> references,
>>>>> the start of the uninitialized data section for bss references, and the
>>>>> start of the data section for initialized data and constants. So there
>>>>> are about four symbols whose value you need to keep. Take a look at
>>>>> http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
>>>>> already looked at) and it tells you what to do with R_386_PC32 ad
>>>>> R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
>>>>> will remove all the symbols you don't actually need, but I don't know
>>>>> that for sure. Note also that you can change the section flags of a
>>>>> section marked noload to load.
>>>>>
>>>>
>>>> Still think you can get away with just ADDING an offset. The image is
>>>> linked to a
>>>> specific address and then you move the whole image to a new address.
>>>> Therefore
>>>> you should be able to read the current address, add offset, write back
>>>> the
>>>>
>>>
>>> new address.
>>> OK, I don't really get this at all....
>>>
>>> This code:
>>>
>>> printf ("\n\n%s\n\n", version_string);
>>>
>>> gets compiled into:
>>>
>>> 380403e7: 68 a4 18 05 38 push $0x380518a4
>>> 380403ec: 68 de 2c 05 38 push $0x38052cde
>>> 380403f1: e8 4f 84 00 00 call 38048845 <printf>
>>>
>>> With relocation entries in .rel.text of:
>>>
>>> Offset Info Type Sym.Value Sym. Name
>>> 380403e8 00016201 R_386_32 380519f0 version_string
>>> 380403ed 00000201 R_386_32 380519f0 .rodata
>>> 380403f2 00016b02 R_386_PC32 38048991 printf
>>>
>>> Now I get the first two (R_386_32) entries - Relocation involves a simple
>>> addition of an offset to the values at addresses 0x380403e8 and
>>> 0x380403ed
>>> (of course, these addresses will be offset)
>>>
>>> However, the R_386_PC32 is an enigma - The call is already relative -
>>> there is no need to relocate it at all (call is a position independent
>>> opcode because it is a relative jump!)
>>>
>>
>> Yes, but printf is defined in glibc s? the app needs to relocate the call
>> to glibc.
>
> Actually, the reason the call is relocatable is that the compiler DOESN'T
> KNOW where printf is at all. If it is in a library, it will not be in the
> text segment and must be relocated accordingly. It may be in a different
> segment for some reason. In any case, the compiler doesn't know the address
> in the image where printf resides, so it needs a relocation entry to get the
> value filled in at link time. After the value is filled in, if the
> referenced symbol is in the same segment (probably .text) as the point of
> reference, the relocation reference is probably of no more use. However,
> there is no rule that says the linker must delete the reference from the
> relocation list.
>>
>> U-boot has all it needs so there you should not have PC32 I think.
>> Try defining a local static function. For non static functions
>> you may need to define visibility=hidden and/or -Bsymbolic too.
>>
>
> Won't help. Any symbols referenced but not defined locally are relocatable.
> After linking, they MAY, but need not, go away.
>>
>> You also need to look at the img after final linking.
>>
>
> After linking, if the symbol is defined, the R_386_PC32 is no longer
> important UNLESS the symbol referenced is in a different segment AND the
> segments are relocated with different offsets from each other than
> originally linked. For this reason, I think the linker will not discard
> these relocations. If we are not relocating the segments with different
> relative offsets, we can ignore these relocations as the change in offset
> will come out to be zero anyway. However, if you process them normally, you
> will just add 0 and nothing will change.
>>
>>
>>>
>>> Will all R_386_PC32 be like this? Can I simply ignore them all? If so,
>>> why
>>> do they even need to be generated?
>>>
>>
>> Hopefully you won't have any.
>
> I think they may still be there, because we ask the linker to preserve
> relocation information. However, if the entire image is being relocated, not
> changing the order or relative offset of any segments, they can be ignored,
> because the relative values will not change. It will be interesting to know
> if they remain or if the linker drops them out. For references in the same
> segment, we can hope that they get dropped. For references across segments
> (if any), or any undefined symbols, they will remain.
>>
>> Not sure about weak functions though. These might
>> need PC32 relocs in some cases.
>>
>
> There can be PC32 relocs referencing the weak symbol, but that symbol may be
> undefined.
>>
>> Also, if you look at _dl_do_reloc() in uClibc/ldso/ldso/i386/elfinterp.c I
>> think
>> you can replace symbol_addr with relocation offset.
>>
>
> I agree, in the case you a moving the entire image and ignoring PC32 relocs.
>
> Best Regards,
> Bill Campbell
>>
>> Jocke
>>
Apologies if this is getting way off-topic for a simple boot loader, but
this is information I have gathered from far and wide over the net. I am
surprised that there isn't a web site out there on 'How to create a
relocatable boot loader'...
OK, its all starting to come together now - It helps when you look at the
right files ;)
Firstly, u-boot.map
0x380589a0 __rel_dyn_start = .
.rel.dyn 0x380589a0 0x42b0
*(.rel.dyn)
.rel.got 0x00000000 0x0 cpu/i386/start.o
.rel.plt 0x00000000 0x0 cpu/i386/start.o
.rel.text 0x380589a0 0x2e28 cpu/i386/start.o
.rel.start16 0x3805b7c8 0x10 cpu/i386/start.o
.rel.data 0x3805b7d8 0xc18 cpu/i386/start.o
.rel.rodata 0x3805c3f0 0x360 cpu/i386/start.o
.rel.u_boot_cmd
0x3805c750 0x500 cpu/i386/start.o
0x3805cc50 __rel_dyn_end = .
And the output of readelf...
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 38040000 001000 0118a4 00 AX 0 0 4
[ 2] .rel.text REL 00000000 066c68 005d00 08 40 1 4
[ 3] .rodata PROGBITS 380518a4 0128a4 005da5 00 A 0 0 4
[ 4] .rel.rodata REL 00000000 06c968 000360 08 40 3 4
[ 5] .interp PROGBITS 38057649 018649 000013 00 A 0 0 1
[ 6] .dynstr STRTAB 3805765c 01865c 0001ee 00 A 0 0 1
[ 7] .hash HASH 3805784c 01884c 0000cc 04 A 11 0 4
[ 8] .data PROGBITS 38057918 018918 000a3c 00 WA 0 0 4
[ 9] .rel.data REL 00000000 06ccc8 000c18 08 40 8 4
[10] .got.plt PROGBITS 38058354 019354 00000c 04 WA 0 0 4
[11] .dynsym DYNSYM 38058360 019360 000200 10 A 6 1 4
[12] .dynamic DYNAMIC 38058560 019560 000080 08 WA 6 0 4
[13] .u_boot_cmd PROGBITS 380585e0 0195e0 0003c0 00 WA 0 0 4
[14] .rel.u_boot_cmd REL 00000000 06d8e0 000500 08 40 13 4
[15] .bss NOBITS 3805cc50 01ec50 001a34 00 WA 0 0 4
[16] .bios PROGBITS 00000000 01e000 00053e 00 AX 0 0 1
[17] .rel.bios REL 00000000 06dde0 0000c0 08 40 16 4
[18] .rel.dyn REL 380589a0 0199a0 0042b0 08 A 11 0 4
[19] .start16 PROGBITS 0000f800 01e800 000110 00 AX 0 0 1
[20] .rel.start16 REL 00000000 06dea0 000038 08 40 19 4
[21] .resetvec PROGBITS 0000fff0 01eff0 000010 00 AX 0 0 1
[22] .rel.resetvec REL 00000000 06ded8 000008 08 40 21 4
...
Relocation section '.rel.text' at offset 0x66c68 contains 2976 entries:
Offset Info Type Sym.Value Sym. Name
38040010 00000101 R_386_32 38040000 .text
3804001e 00000101 R_386_32 38040000 .text
38040028 00000101 R_386_32 38040000 .text
3804003f 00000101 R_386_32 38040000 .text
38040051 00000101 R_386_32 38040000 .text
38040075 00000101 R_386_32 38040000 .text
38040085 00000101 R_386_32 38040000 .text
3804009d 0003e602 R_386_PC32 380403fa load_uboot
380400a6 00000101 R_386_32 38040000 .text
38040015 00029f02 R_386_PC32 3804bdd8 early_board_init
38040023 0003f702 R_386_PC32 3804bdda show_boot_progress_asm
...
Relocation section '.rel.rodata' at offset 0x6c968 contains 108 entries:
Offset Info Type Sym.Value Sym. Name
38051908 00000201 R_386_32 380518a4 .rodata
38051938 00000201 R_386_32 380518a4 .rodata
38051968 00000201 R_386_32 380518a4 .rodata
38051998 00000201 R_386_32 380518a4 .rodata
380519c8 00000201 R_386_32 380518a4 .rodata
380519f8 00000201 R_386_32 380518a4 .rodata
...
Relocation section '.rel.dyn' at offset 0x199a0 contains 2134 entries:
Offset Info Type Sym.Value Sym. Name
0000f838 00000008 R_386_RELATIVE
0000f846 00000008 R_386_RELATIVE
38040010 00000008 R_386_RELATIVE
3804001e 00000008 R_386_RELATIVE
38040028 00000008 R_386_RELATIVE
3804003f 00000008 R_386_RELATIVE
38040051 00000008 R_386_RELATIVE
38040075 00000008 R_386_RELATIVE
38040085 00000008 R_386_RELATIVE
Notice that, apart from .rel.dyn, non of the .rel.* sections have the
A (Allocated) flag set - They do not end up in the stripped binary image.
.rel.dyn is allocated in the binary image with all the R_386_PC32 entries
from the other .rel section are discarded and the R_386_32 have been
'converted' to R_386_RELATIVE which are simple to adjust (locate in memory
and adjust by the relocation offset)
The relocation fixup is really easy:
Elf32_Rel *rel_dyn_start = (Elf32_Rel *)&__rel_dyn_start;
Elf32_Rel *rel_dyn_end = (Elf32_Rel *)&__rel_dyn_end;
Elf32_Rel *re;
for (re = rel_dyn_start; re < rel_dyn_end; re++)
{
if (re->r_offset >= TEXT_BASE)
if (*(ulong *)re->r_offset >= TEXT_BASE)
*(ulong *)(re->r_offset - rel_offset) -= (Elf32_Addr)rel_offset;
}
The size penalty is ~17kB of extra data (which is not copied to RAM) and
a tiny amount of relocation code (easily offset by removal of other fixups
such as the command table fixup
Any without using the pic flag in gcc, there is no GOT and no associated
performance penalty.
Thanks for everyone's help (especially Jocke and Bill)
Regards,
Graeme
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-17 5:17 ` Graeme Russ
@ 2009-10-17 12:32 ` Joakim Tjernlund
2009-10-17 12:59 ` J. William Campbell
1 sibling, 0 replies; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-17 12:32 UTC (permalink / raw)
To: u-boot
Graeme Russ <graeme.russ@gmail.com> wrote on 17/10/2009 07:17:04:
>
[SNIP]
>
> Apologies if this is getting way off-topic for a simple boot loader, but
> this is information I have gathered from far and wide over the net. I am
> surprised that there isn't a web site out there on 'How to create a
> relocatable boot loader'...
:), now you can write one :)
>
> OK, its all starting to come together now - It helps when you look at the
> right files ;)
>
> Firstly, u-boot.map
>
> 0x380589a0 __rel_dyn_start = .
>
> .rel.dyn 0x380589a0 0x42b0
> *(.rel.dyn)
> .rel.got 0x00000000 0x0 cpu/i386/start.o
> .rel.plt 0x00000000 0x0 cpu/i386/start.o
> .rel.text 0x380589a0 0x2e28 cpu/i386/start.o
> .rel.start16 0x3805b7c8 0x10 cpu/i386/start.o
> .rel.data 0x3805b7d8 0xc18 cpu/i386/start.o
> .rel.rodata 0x3805c3f0 0x360 cpu/i386/start.o
> .rel.u_boot_cmd
> 0x3805c750 0x500 cpu/i386/start.o
> 0x3805cc50 __rel_dyn_end = .
>
>
> And the output of readelf...
>
> Section Headers:
> [Nr] Name Type Addr Off Size ES Flg Lk Inf Al
> [ 0] NULL 00000000 000000 000000 00 0 0 0
> [ 1] .text PROGBITS 38040000 001000 0118a4 00 AX 0 0 4
> [ 2] .rel.text REL 00000000 066c68 005d00 08 40 1 4
> [ 3] .rodata PROGBITS 380518a4 0128a4 005da5 00 A 0 0 4
> [ 4] .rel.rodata REL 00000000 06c968 000360 08 40 3 4
> [ 5] .interp PROGBITS 38057649 018649 000013 00 A 0 0 1
> [ 6] .dynstr STRTAB 3805765c 01865c 0001ee 00 A 0 0 1
> [ 7] .hash HASH 3805784c 01884c 0000cc 04 A 11 0 4
> [ 8] .data PROGBITS 38057918 018918 000a3c 00 WA 0 0 4
> [ 9] .rel.data REL 00000000 06ccc8 000c18 08 40 8 4
> [10] .got.plt PROGBITS 38058354 019354 00000c 04 WA 0 0 4
> [11] .dynsym DYNSYM 38058360 019360 000200 10 A 6 1 4
> [12] .dynamic DYNAMIC 38058560 019560 000080 08 WA 6 0 4
> [13] .u_boot_cmd PROGBITS 380585e0 0195e0 0003c0 00 WA 0 0 4
> [14] .rel.u_boot_cmd REL 00000000 06d8e0 000500 08 40 13 4
> [15] .bss NOBITS 3805cc50 01ec50 001a34 00 WA 0 0 4
> [16] .bios PROGBITS 00000000 01e000 00053e 00 AX 0 0 1
> [17] .rel.bios REL 00000000 06dde0 0000c0 08 40 16 4
> [18] .rel.dyn REL 380589a0 0199a0 0042b0 08 A 11 0 4
> [19] .start16 PROGBITS 0000f800 01e800 000110 00 AX 0 0 1
> [20] .rel.start16 REL 00000000 06dea0 000038 08 40 19 4
> [21] .resetvec PROGBITS 0000fff0 01eff0 000010 00 AX 0 0 1
> [22] .rel.resetvec REL 00000000 06ded8 000008 08 40 21 4
>
> ...
>
> Relocation section '.rel.text' at offset 0x66c68 contains 2976 entries:
> Offset Info Type Sym.Value Sym. Name
> 38040010 00000101 R_386_32 38040000 .text
> 3804001e 00000101 R_386_32 38040000 .text
> 38040028 00000101 R_386_32 38040000 .text
> 3804003f 00000101 R_386_32 38040000 .text
> 38040051 00000101 R_386_32 38040000 .text
> 38040075 00000101 R_386_32 38040000 .text
> 38040085 00000101 R_386_32 38040000 .text
> 3804009d 0003e602 R_386_PC32 380403fa load_uboot
> 380400a6 00000101 R_386_32 38040000 .text
> 38040015 00029f02 R_386_PC32 3804bdd8 early_board_init
> 38040023 0003f702 R_386_PC32 3804bdda show_boot_progress_asm
>
> ...
>
> Relocation section '.rel.rodata' at offset 0x6c968 contains 108 entries:
> Offset Info Type Sym.Value Sym. Name
> 38051908 00000201 R_386_32 380518a4 .rodata
> 38051938 00000201 R_386_32 380518a4 .rodata
> 38051968 00000201 R_386_32 380518a4 .rodata
> 38051998 00000201 R_386_32 380518a4 .rodata
> 380519c8 00000201 R_386_32 380518a4 .rodata
> 380519f8 00000201 R_386_32 380518a4 .rodata
>
> ...
>
> Relocation section '.rel.dyn' at offset 0x199a0 contains 2134 entries:
> Offset Info Type Sym.Value Sym. Name
> 0000f838 00000008 R_386_RELATIVE
> 0000f846 00000008 R_386_RELATIVE
> 38040010 00000008 R_386_RELATIVE
> 3804001e 00000008 R_386_RELATIVE
> 38040028 00000008 R_386_RELATIVE
> 3804003f 00000008 R_386_RELATIVE
> 38040051 00000008 R_386_RELATIVE
> 38040075 00000008 R_386_RELATIVE
> 38040085 00000008 R_386_RELATIVE
>
> Notice that, apart from .rel.dyn, non of the .rel.* sections have the
> A (Allocated) flag set - They do not end up in the stripped binary image.
> .rel.dyn is allocated in the binary image with all the R_386_PC32 entries
> from the other .rel section are discarded and the R_386_32 have been
> 'converted' to R_386_RELATIVE which are simple to adjust (locate in memory
> and adjust by the relocation offset)
Ah, they are converted to relative. Wonder if all archs do this?
If so one only will need two reloc functions, one for Rel and
one for Rela relocs.
>
> The relocation fixup is really easy:
>
> Elf32_Rel *rel_dyn_start = (Elf32_Rel *)&__rel_dyn_start;
> Elf32_Rel *rel_dyn_end = (Elf32_Rel *)&__rel_dyn_end;
> Elf32_Rel *re;
>
> for (re = rel_dyn_start; re < rel_dyn_end; re++)
> {
> if (re->r_offset >= TEXT_BASE)
> if (*(ulong *)re->r_offset >= TEXT_BASE)
> *(ulong *)(re->r_offset - rel_offset) -= (Elf32_Addr)rel_offset;
> }
Do you need the TEXT_BASE stuff or is it just a precaution?
Not sure if you need some test for NULL to handle weak undefined symbols though.
> The size penalty is ~17kB of extra data (which is not copied to RAM) and
> a tiny amount of relocation code (easily offset by removal of other fixups
> such as the command table fixup
17kB, how does that compare to the -fPIC version?
>
> Any without using the pic flag in gcc, there is no GOT and no associated
> performance penalty.
Yep :)
>
> Thanks for everyone's help (especially Jocke and Bill)
NP, will we see a patch soon?
Jocke
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-17 5:17 ` Graeme Russ
2009-10-17 12:32 ` Joakim Tjernlund
@ 2009-10-17 12:59 ` J. William Campbell
2009-10-17 21:29 ` Graeme Russ
1 sibling, 1 reply; 47+ messages in thread
From: J. William Campbell @ 2009-10-17 12:59 UTC (permalink / raw)
To: u-boot
Graeme Russ wrote:
> On Thu, Oct 15, 2009 at 3:45 AM, J. William Campbell
> <jwilliamcampbell@comcast.net> wrote:
>
>> Joakim Tjernlund wrote:
>>
>
<megasnip>
> Apologies if this is getting way off-topic for a simple boot loader, but
> this is information I have gathered from far and wide over the net. I am
> surprised that there isn't a web site out there on 'How to create a
> relocatable boot loader'...
>
> OK, its all starting to come together now - It helps when you look at the
> right files ;)
>
> Firstly, u-boot.map
>
> 0x380589a0 __rel_dyn_start = .
>
> .rel.dyn 0x380589a0 0x42b0
> *(.rel.dyn)
> .rel.got 0x00000000 0x0 cpu/i386/start.o
> .rel.plt 0x00000000 0x0 cpu/i386/start.o
> .rel.text 0x380589a0 0x2e28 cpu/i386/start.o
> .rel.start16 0x3805b7c8 0x10 cpu/i386/start.o
> .rel.data 0x3805b7d8 0xc18 cpu/i386/start.o
> .rel.rodata 0x3805c3f0 0x360 cpu/i386/start.o
> .rel.u_boot_cmd
> 0x3805c750 0x500 cpu/i386/start.o
> 0x3805cc50 __rel_dyn_end = .
>
>
> And the output of readelf...
>
> Section Headers:
> [Nr] Name Type Addr Off Size ES Flg Lk Inf Al
> [ 0] NULL 00000000 000000 000000 00 0 0 0
> [ 1] .text PROGBITS 38040000 001000 0118a4 00 AX 0 0 4
> [ 2] .rel.text REL 00000000 066c68 005d00 08 40 1 4
> [ 3] .rodata PROGBITS 380518a4 0128a4 005da5 00 A 0 0 4
> [ 4] .rel.rodata REL 00000000 06c968 000360 08 40 3 4
> [ 5] .interp PROGBITS 38057649 018649 000013 00 A 0 0 1
> [ 6] .dynstr STRTAB 3805765c 01865c 0001ee 00 A 0 0 1
> [ 7] .hash HASH 3805784c 01884c 0000cc 04 A 11 0 4
> [ 8] .data PROGBITS 38057918 018918 000a3c 00 WA 0 0 4
> [ 9] .rel.data REL 00000000 06ccc8 000c18 08 40 8 4
> [10] .got.plt PROGBITS 38058354 019354 00000c 04 WA 0 0 4
> [11] .dynsym DYNSYM 38058360 019360 000200 10 A 6 1 4
> [12] .dynamic DYNAMIC 38058560 019560 000080 08 WA 6 0 4
> [13] .u_boot_cmd PROGBITS 380585e0 0195e0 0003c0 00 WA 0 0 4
> [14] .rel.u_boot_cmd REL 00000000 06d8e0 000500 08 40 13 4
> [15] .bss NOBITS 3805cc50 01ec50 001a34 00 WA 0 0 4
> [16] .bios PROGBITS 00000000 01e000 00053e 00 AX 0 0 1
> [17] .rel.bios REL 00000000 06dde0 0000c0 08 40 16 4
> [18] .rel.dyn REL 380589a0 0199a0 0042b0 08 A 11 0 4
> [19] .start16 PROGBITS 0000f800 01e800 000110 00 AX 0 0 1
> [20] .rel.start16 REL 00000000 06dea0 000038 08 40 19 4
> [21] .resetvec PROGBITS 0000fff0 01eff0 000010 00 AX 0 0 1
> [22] .rel.resetvec REL 00000000 06ded8 000008 08 40 21 4
>
> ...
>
> Relocation section '.rel.text' at offset 0x66c68 contains 2976 entries:
> Offset Info Type Sym.Value Sym. Name
> 38040010 00000101 R_386_32 38040000 .text
> 3804001e 00000101 R_386_32 38040000 .text
> 38040028 00000101 R_386_32 38040000 .text
> 3804003f 00000101 R_386_32 38040000 .text
> 38040051 00000101 R_386_32 38040000 .text
> 38040075 00000101 R_386_32 38040000 .text
> 38040085 00000101 R_386_32 38040000 .text
> 3804009d 0003e602 R_386_PC32 380403fa load_uboot
> 380400a6 00000101 R_386_32 38040000 .text
> 38040015 00029f02 R_386_PC32 3804bdd8 early_board_init
> 38040023 0003f702 R_386_PC32 3804bdda show_boot_progress_asm
>
> ...
>
> Relocation section '.rel.rodata' at offset 0x6c968 contains 108 entries:
> Offset Info Type Sym.Value Sym. Name
> 38051908 00000201 R_386_32 380518a4 .rodata
> 38051938 00000201 R_386_32 380518a4 .rodata
> 38051968 00000201 R_386_32 380518a4 .rodata
> 38051998 00000201 R_386_32 380518a4 .rodata
> 380519c8 00000201 R_386_32 380518a4 .rodata
> 380519f8 00000201 R_386_32 380518a4 .rodata
>
> ...
>
> Relocation section '.rel.dyn' at offset 0x199a0 contains 2134 entries:
> Offset Info Type Sym.Value Sym. Name
> 0000f838 00000008 R_386_RELATIVE
> 0000f846 00000008 R_386_RELATIVE
> 38040010 00000008 R_386_RELATIVE
> 3804001e 00000008 R_386_RELATIVE
> 38040028 00000008 R_386_RELATIVE
> 3804003f 00000008 R_386_RELATIVE
> 38040051 00000008 R_386_RELATIVE
> 38040075 00000008 R_386_RELATIVE
> 38040085 00000008 R_386_RELATIVE
>
> Notice that, apart from .rel.dyn, non of the .rel.* sections have the
> A (Allocated) flag set - They do not end up in the stripped binary image.
> .rel.dyn is allocated in the binary image with all the R_386_PC32 entries
> from the other .rel section are discarded and the R_386_32 have been
> 'converted' to R_386_RELATIVE which are simple to adjust (locate in memory
> and adjust by the relocation offset)
>
> The relocation fixup is really easy:
>
> Elf32_Rel *rel_dyn_start = (Elf32_Rel *)&__rel_dyn_start;
> Elf32_Rel *rel_dyn_end = (Elf32_Rel *)&__rel_dyn_end;
> Elf32_Rel *re;
>
> for (re = rel_dyn_start; re < rel_dyn_end; re++)
> {
> if (re->r_offset >= TEXT_BASE)
> if (*(ulong *)re->r_offset >= TEXT_BASE)
> *(ulong *)(re->r_offset - rel_offset) -= (Elf32_Addr)rel_offset;
> }
>
> The size penalty is ~17kB of extra data (which is not copied to RAM) and
> a tiny amount of relocation code (easily offset by removal of other fixups
> such as the command table fixup
>
> Any without using the pic flag in gcc, there is no GOT and no associated
> performance penalty.
>
> Thanks for everyone's help (especially Jocke and Bill)
>
Great work Graeme. You have taken a lot of conjecture and guessing and
converted it to actual truth!
In line with your comment about -fpic, the .text segment size goes from
000137fc down to 000118a4, or about an 8 k reduction in size. -fpic also
contains a .rel_dyn segment, that presumably needs to be processed the
same way as in the non -fpic case (otherwise, why would it be there?).
The size of the "residual" .rel_dyn was 00001228, or 4.6 k. This means
that the size penalty for not using -fpic is only about 3k bytes total
in the image, and the ram footprint is actually smaller than with -fpic.
So now, after Graeme's work here, it is easily possible to support three
different u-boot configurations, absolute, relocatable, and relocatable
with -fpic. If there are any size maniacs out there, we can reduce the
size of the relocation table at the expense of some post-processing.
These days, 9k of flash vs 4.5k of flash doesn't seem important, but I
imagine if you are right against the stops on an existing product it can
be very important!
It will be interesting to see similar numbers for other architectures. I
expect similar results, but you never know. PPC relocation entries are
larger, so they become more of an issue.
Still more questions for Graeme if he will indulge me! Are the if
statements in the relocation code ever false? Are there relocations for
stuff below TEXT_BASE in
the input binary? If so, do you have any idea why? Not that two if
statements are a big deal, it is just that I can't explain why there
would be any relocations below TEXT_BASE, and I can't explain why there
would be any relocatable references to anything below text base. . I
assume this might be related to not relocating NULL pointers. That would
be reflected in the innermost if statement. I would not expect there to
be any such references, as gas does know the relocation attributes of
initialized data, and NULL is absolute(?) Also, if a function is not
defined (weak or otherwise), the loader should give it an address of
absolute 0, which would also not generate a relocation entry(?). It
would be interesting to intentionally call an un-defined function in
u-boot and see if the call ends up relocatable. It should not, and if it
does we should file a bug report for ld!
Thanks again Graeme!
Best Regards,
Bill Campbell
> Regards,
>
> Graeme
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* [U-Boot] Relocation size penalty calculation
2009-10-17 12:59 ` J. William Campbell
@ 2009-10-17 21:29 ` Graeme Russ
0 siblings, 0 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-17 21:29 UTC (permalink / raw)
To: u-boot
On Sat, Oct 17, 2009 at 11:59 PM, J. William Campbell
<jwilliamcampbell@comcast.net> wrote:
> Graeme Russ wrote:
>>
>> On Thu, Oct 15, 2009 at 3:45 AM, J. William Campbell
>> <jwilliamcampbell@comcast.net> wrote:
>>
>>>
>>> Joakim Tjernlund wrote:
>>>
>>
>>
>
> <megasnip>
>
[Yawn... YAS (Yet Another Snip) ;)]
>>
>> The relocation fixup is really easy:
>>
>> Elf32_Rel *rel_dyn_start = (Elf32_Rel *)&__rel_dyn_start;
>> Elf32_Rel *rel_dyn_end = (Elf32_Rel *)&__rel_dyn_end;
>> Elf32_Rel *re;
>>
>> for (re = rel_dyn_start; re < rel_dyn_end; re++)
>> {
>> if (re->r_offset >= TEXT_BASE)
>> if (*(ulong *)re->r_offset >= TEXT_BASE)
>> *(ulong *)(re->r_offset - rel_offset) -=
>> (Elf32_Addr)rel_offset;
>> }
>>
>> The size penalty is ~17kB of extra data (which is not copied to RAM) and
>> a tiny amount of relocation code (easily offset by removal of other fixups
>> such as the command table fixup
>>
>> Any without using the pic flag in gcc, there is no GOT and no associated
>> performance penalty.
>>
>> Thanks for everyone's help (especially Jocke and Bill)
>>
>
> Great work Graeme. You have taken a lot of conjecture and guessing and
> converted it to actual truth!
>
> In line with your comment about -fpic, the .text segment size goes from
> 000137fc down to 000118a4, or about an 8 k reduction in size. -fpic also
> contains a .rel_dyn segment, that presumably needs to be processed the same
> way as in the non -fpic case (otherwise, why would it be there?). The size
> of the "residual" .rel_dyn was 00001228, or 4.6 k. This means that the size
> penalty for not using -fpic is only about 3k bytes total in the image, and
> the ram footprint is actually smaller than with -fpic. So now, after
Yes, especially on the x86 because with -fpic, the x86 needs to do a CALL/POP
in the beginning of each function to determine the current IP in order to
calculate absolute addresses using the GOT (ouch!)
> Graeme's work here, it is easily possible to support three different u-boot
> configurations, absolute, relocatable, and relocatable with -fpic. If there
> are any size maniacs out there, we can reduce the size of the relocation
> table at the expense of some post-processing. These days, 9k of flash vs
> 4.5k of flash doesn't seem important, but I imagine if you are right against
> the stops on an existing product it can be very important!
>
> It will be interesting to see similar numbers for other architectures. I
> expect similar results, but you never know. PPC relocation entries are
> larger, so they become more of an issue.
>
> Still more questions for Graeme if he will indulge me! Are the if statements
> in the relocation code ever false? Are there relocations for stuff below
> TEXT_BASE in
> the input binary? If so, do you have any idea why? Not that two if
> statements are a big deal, it is just that I can't explain why there would
> be any relocations below TEXT_BASE, and I can't explain why there would be
> any relocatable references to anything below text base. . I assume this
> might be related to not relocating NULL pointers. That would be reflected in
> the innermost if statement. I would not expect there to be any such
> references, as gas does know the relocation attributes of initialized data,
> and NULL is absolute(?) Also, if a function is not defined (weak or
> otherwise), the loader should give it an address of absolute 0, which would
> also not generate a relocation entry(?). It would be interesting to
> intentionally call an un-defined function in u-boot and see if the call ends
> up relocatable. It should not, and if it does we should file a bug report
> for ld!
Apart from NULL pointers, there are some peculiarities for x86 that have
to be dealt with. There are two sections (for BIOS and the real mode
trampoline) which get linked at a hard coded memory location in the low
are of memory (<16M) - The TEXT_BASE checks are to ensure these do not get
trampled.
>
> Thanks again Graeme!
>
NP - Just scratching an itch
> Best Regards,
> Bill Campbell
>>
>> Regards,
>>
>> Graeme
>>
>>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2009-10-17 21:29 UTC | newest]
Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-08 11:54 [U-Boot] Relocation size penalty calculation Graeme Russ
2009-10-08 14:14 ` Peter Tyser
2009-10-08 15:53 ` J. William Campbell
2009-10-08 16:15 ` Peter Tyser
2009-10-08 16:50 ` J. William Campbell
2009-10-08 15:58 ` J. William Campbell
2009-10-08 20:58 ` Graeme Russ
2009-10-08 21:23 ` Wolfgang Denk
2009-10-08 22:02 ` Graeme Russ
2009-10-08 22:20 ` Peter Tyser
2009-10-09 1:25 ` Mike Frysinger
2009-10-09 1:43 ` Graeme Russ
2009-10-08 22:27 ` J. William Campbell
2009-10-08 22:39 ` Graeme Russ
2009-10-08 23:12 ` Joakim Tjernlund
2009-10-09 0:09 ` J. William Campbell
2009-10-10 4:43 ` Graeme Russ
2009-10-10 8:07 ` Joakim Tjernlund
2009-10-10 8:46 ` Graeme Russ
2009-10-10 9:27 ` Joakim Tjernlund
2009-10-10 10:38 ` Graeme Russ
2009-10-10 10:47 ` Joakim Tjernlund
2009-10-10 11:21 ` Graeme Russ
2009-10-10 15:38 ` Joakim Tjernlund
2009-10-11 10:47 ` Graeme Russ
[not found] ` <OF83D1271F.04B67606-ONC125764C.0045BFF2-C125764C.0046AC45@transmode.se>
2009-10-13 11:21 ` Graeme Russ
2009-10-13 11:53 ` Joakim Tjernlund
2009-10-13 16:30 ` J. William Campbell
2009-10-13 16:55 ` Joakim Tjernlund
2009-10-13 20:06 ` Graeme Russ
[not found] ` <OF32A18F38.511FF11C-ONC125764E.00750716-C125764E.007534EE@ <4AD511E4.9090204@comcast.net>
2009-10-13 21:20 ` Joakim Tjernlund
2009-10-13 23:48 ` J. William Campbell
2009-10-14 7:25 ` Joakim Tjernlund
2009-10-14 11:48 ` Graeme Russ
2009-10-14 12:38 ` Joakim Tjernlund
2009-10-14 16:45 ` J. William Campbell
2009-10-17 5:17 ` Graeme Russ
2009-10-17 12:32 ` Joakim Tjernlund
2009-10-17 12:59 ` J. William Campbell
2009-10-17 21:29 ` Graeme Russ
2009-10-14 15:35 ` J. William Campbell
2009-10-14 16:05 ` Joakim Tjernlund
2009-10-14 16:49 ` J. William Campbell
[not found] ` <4AD0B3D7.7020900@comcast.net>
2009-10-11 1:31 ` Graeme Russ
2009-10-10 16:52 ` Mike Frysinger
2009-10-10 17:45 ` Joakim Tjernlund
2009-10-11 0:43 ` Graeme Russ
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox