public inbox for u-boot@lists.denx.de
 help / color / mirror / Atom feed
* [U-Boot] Relocation size penalty calculation
@ 2009-10-08 11:54 Graeme Russ
  2009-10-08 14:14 ` Peter Tyser
  2009-10-08 15:58 ` J. William Campbell
  0 siblings, 2 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-08 11:54 UTC (permalink / raw)
  To: u-boot

Out of curiosity, I wanted to see just how much of a size penalty I am
incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
the results (fixed width font will help - its space, not tab, formatted):

Section             non-reloc     reloc
---------------------------------------
.text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
.rodata              00005bad  000059d0
.interp              n/a       00000013
.dynstr              n/a       00000648
.hash                n/a       00000428
.eh_frame            00003268  000034fc
.data                00000a6c  000001dc
.data.rel            n/a       00000098
.data.rel.ro.local   n/a       00000178
.data.rel.local      n/a       000007e4
.got                 00000000  000001f0
.got.plt             n/a       0000000c
.rel.got             n/a       000003e0
.rel.dyn             n/a       00001228
.dynsym              n/a       00000850
.dynamic             n/a       00000080
.u_boot_cmd          000003c0  000003c0
.bss                 00001a34  00001a34
.realmode            00000166  00000166
.bios                0000053e  0000053e
=======================================
Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger

Its more than a 16% increase in size!!!

.text accounts for a little under half of the total bloat, and of that,
the crude dynamic loader accounts for only 341 bytes

Have any metrics been done for PPC?

Regards,

Graeme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-08 11:54 [U-Boot] Relocation size penalty calculation Graeme Russ
@ 2009-10-08 14:14 ` Peter Tyser
  2009-10-08 15:53   ` J. William Campbell
  2009-10-08 15:58 ` J. William Campbell
  1 sibling, 1 reply; 47+ messages in thread
From: Peter Tyser @ 2009-10-08 14:14 UTC (permalink / raw)
  To: u-boot

On Thu, 2009-10-08 at 22:54 +1100, Graeme Russ wrote:
> Out of curiosity, I wanted to see just how much of a size penalty I am
> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
> the results (fixed width font will help - its space, not tab, formatted):
> 
> Section             non-reloc     reloc
> ---------------------------------------
> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
> .rodata              00005bad  000059d0
> .interp              n/a       00000013
> .dynstr              n/a       00000648
> .hash                n/a       00000428
> .eh_frame            00003268  000034fc
> .data                00000a6c  000001dc
> .data.rel            n/a       00000098
> .data.rel.ro.local   n/a       00000178
> .data.rel.local      n/a       000007e4
> .got                 00000000  000001f0
> .got.plt             n/a       0000000c
> .rel.got             n/a       000003e0
> .rel.dyn             n/a       00001228
> .dynsym              n/a       00000850
> .dynamic             n/a       00000080
> .u_boot_cmd          000003c0  000003c0
> .bss                 00001a34  00001a34
> .realmode            00000166  00000166
> .bios                0000053e  0000053e
> =======================================
> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
> 
> Its more than a 16% increase in size!!!
> 
> .text accounts for a little under half of the total bloat, and of that,
> the crude dynamic loader accounts for only 341 bytes
> 
> Have any metrics been done for PPC?

Things actually improve a little bit when we use -mrelocatable and get
rid of all the manual "+= gd->reloc_off" fixups:

1) Top of mainline on XPedite5370:
   text	   data	    bss	    dec	    hex	filename
 308612	  24488	  33172	 366272	  596c0	u-boot

2) Top of "reloc" branch on XPedite5370 (ie -mrelocatable):
   text	   data	    bss	    dec	    hex	filename
 303704	  28644	  33156	 365504	  593c0	u-boot

For fun:
3) #2 but with s/-mrelocatable/-fpic/ (probably doesn't boot):
   text	   data	    bss	    dec	    hex	filename
 303704	  24472	  33156	 361332	  58374	u-boot


There may be some other changes that affect the size between mainline
and "reloc", but their sizes are in the same general ballpark.

Best,
Peter

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-08 14:14 ` Peter Tyser
@ 2009-10-08 15:53   ` J. William Campbell
  2009-10-08 16:15     ` Peter Tyser
  0 siblings, 1 reply; 47+ messages in thread
From: J. William Campbell @ 2009-10-08 15:53 UTC (permalink / raw)
  To: u-boot

Peter Tyser wrote:
> On Thu, 2009-10-08 at 22:54 +1100, Graeme Russ wrote:
>   
>> Out of curiosity, I wanted to see just how much of a size penalty I am
>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>> the results (fixed width font will help - its space, not tab, formatted):
>>
>> Section             non-reloc     reloc
>> ---------------------------------------
>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
>> .rodata              00005bad  000059d0
>> .interp              n/a       00000013
>> .dynstr              n/a       00000648
>> .hash                n/a       00000428
>> .eh_frame            00003268  000034fc
>> .data                00000a6c  000001dc
>> .data.rel            n/a       00000098
>> .data.rel.ro.local   n/a       00000178
>> .data.rel.local      n/a       000007e4
>> .got                 00000000  000001f0
>> .got.plt             n/a       0000000c
>> .rel.got             n/a       000003e0
>> .rel.dyn             n/a       00001228
>> .dynsym              n/a       00000850
>> .dynamic             n/a       00000080
>> .u_boot_cmd          000003c0  000003c0
>> .bss                 00001a34  00001a34
>> .realmode            00000166  00000166
>> .bios                0000053e  0000053e
>> =======================================
>> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
>>
>> Its more than a 16% increase in size!!!
>>
>> .text accounts for a little under half of the total bloat, and of that,
>> the crude dynamic loader accounts for only 341 bytes
>>
>> Have any metrics been done for PPC?
>>     
>
> Things actually improve a little bit when we use -mrelocatable and get
> rid of all the manual "+= gd->reloc_off" fixups:
>
> 1) Top of mainline on XPedite5370:
>    text	   data	    bss	    dec	    hex	filename
>  308612	  24488	  33172	 366272	  596c0	u-boot
>
> 2) Top of "reloc" branch on XPedite5370 (ie -mrelocatable):
>    text	   data	    bss	    dec	    hex	filename
>  303704	  28644	  33156	 365504	  593c0	u-boot
>
>   
Hi Peter,
     Just to be clear, the total text+data length of u-boot with the 
"manual" relocations (#1)  is LARGER than the text+data length of u-boot 
with the "manual" relocations removed and the necessary centralized 
relocation code added, along with any additional data sections required 
by -mrelocateable (#2), by 768 (dec) bytes? And both cases (1 and 2) 
work equivalently?

Best Regards,
Bill Campbell.
> For fun:
> 3) #2 but with s/-mrelocatable/-fpic/ (probably doesn't boot):
>    text	   data	    bss	    dec	    hex	filename
>  303704	  24472	  33156	 361332	  58374	u-boot
>
>
> There may be some other changes that affect the size between mainline
> and "reloc", but their sizes are in the same general ballpark.
>
> Best,
> Peter
>
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
>
>
>   

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-08 11:54 [U-Boot] Relocation size penalty calculation Graeme Russ
  2009-10-08 14:14 ` Peter Tyser
@ 2009-10-08 15:58 ` J. William Campbell
  2009-10-08 20:58   ` Graeme Russ
  1 sibling, 1 reply; 47+ messages in thread
From: J. William Campbell @ 2009-10-08 15:58 UTC (permalink / raw)
  To: u-boot

Graeme Russ wrote:
> Out of curiosity, I wanted to see just how much of a size penalty I am
> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
> the results (fixed width font will help - its space, not tab, formatted):
>
> Section             non-reloc     reloc
> ---------------------------------------
> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
> .rodata              00005bad  000059d0
> .interp              n/a       00000013
> .dynstr              n/a       00000648
> .hash                n/a       00000428
> .eh_frame            00003268  000034fc
> .data                00000a6c  000001dc
> .data.rel            n/a       00000098
> .data.rel.ro.local   n/a       00000178
> .data.rel.local      n/a       000007e4
> .got                 00000000  000001f0
> .got.plt             n/a       0000000c
> .rel.got             n/a       000003e0
> .rel.dyn             n/a       00001228
> .dynsym              n/a       00000850
> .dynamic             n/a       00000080
> .u_boot_cmd          000003c0  000003c0
> .bss                 00001a34  00001a34
> .realmode            00000166  00000166
> .bios                0000053e  0000053e
> =======================================
> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
>
> Its more than a 16% increase in size!!!
>
> .text accounts for a little under half of the total bloat, and of that,
> the crude dynamic loader accounts for only 341 bytes
>   
Hi Graeme,
       I would be interested in a third option (column), the x86 build 
with just -mrelocateable but NOT -fpic. It will not be definitive 
because there will be extra code that references the GOT and missing 
code to do some of the relocation, but it would still be interesting.

Best Regards,
Bill Campbell
> Have any metrics been done for PPC?
>
> Regards,
>
> Graeme
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
>
>
>   

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-08 15:53   ` J. William Campbell
@ 2009-10-08 16:15     ` Peter Tyser
  2009-10-08 16:50       ` J. William Campbell
  0 siblings, 1 reply; 47+ messages in thread
From: Peter Tyser @ 2009-10-08 16:15 UTC (permalink / raw)
  To: u-boot

On Thu, 2009-10-08 at 08:53 -0700, J. William Campbell wrote:
> Peter Tyser wrote:
> > On Thu, 2009-10-08 at 22:54 +1100, Graeme Russ wrote:
> >   
> >> Out of curiosity, I wanted to see just how much of a size penalty I am
> >> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
> >> the results (fixed width font will help - its space, not tab, formatted):
> >>
> >> Section             non-reloc     reloc
> >> ---------------------------------------
> >> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
> >> .rodata              00005bad  000059d0
> >> .interp              n/a       00000013
> >> .dynstr              n/a       00000648
> >> .hash                n/a       00000428
> >> .eh_frame            00003268  000034fc
> >> .data                00000a6c  000001dc
> >> .data.rel            n/a       00000098
> >> .data.rel.ro.local   n/a       00000178
> >> .data.rel.local      n/a       000007e4
> >> .got                 00000000  000001f0
> >> .got.plt             n/a       0000000c
> >> .rel.got             n/a       000003e0
> >> .rel.dyn             n/a       00001228
> >> .dynsym              n/a       00000850
> >> .dynamic             n/a       00000080
> >> .u_boot_cmd          000003c0  000003c0
> >> .bss                 00001a34  00001a34
> >> .realmode            00000166  00000166
> >> .bios                0000053e  0000053e
> >> =======================================
> >> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
> >>
> >> Its more than a 16% increase in size!!!
> >>
> >> .text accounts for a little under half of the total bloat, and of that,
> >> the crude dynamic loader accounts for only 341 bytes
> >>
> >> Have any metrics been done for PPC?
> >>     
> >
> > Things actually improve a little bit when we use -mrelocatable and get
> > rid of all the manual "+= gd->reloc_off" fixups:
> >
> > 1) Top of mainline on XPedite5370:
> >    text	   data	    bss	    dec	    hex	filename
> >  308612	  24488	  33172	 366272	  596c0	u-boot
> >
> > 2) Top of "reloc" branch on XPedite5370 (ie -mrelocatable):
> >    text	   data	    bss	    dec	    hex	filename
> >  303704	  28644	  33156	 365504	  593c0	u-boot
> >
> >   
> Hi Peter,
>      Just to be clear, the total text+data length of u-boot with the 
> "manual" relocations (#1)  is LARGER than the text+data length of u-boot 
> with the "manual" relocations removed and the necessary centralized 
> relocation code added, along with any additional data sections required 
> by -mrelocateable (#2), by 768 (dec) bytes?

Hi Bill,
Doah, looks like I chose a bad board as an example.  The XPedite5370
already had -mrelocatable defined in its own
board/xes/xpedite5370/config.mk in mainline, so the above comparison
should be ignored as both builds used -mrelocatable.

Here's some *real* results from the MPC8548CDS:
1) Top of mainline:
   text	   data	    bss	    dec	    hex	filename
 219968	  17052	  22992	 260012	  3f7ac	u-boot

2) Top of "reloc" branch (ie -mrelocatable)
   text	   data	    bss	    dec	    hex	filename
 219192	  20640	  22980	 262812	  4029c	u-boot

So the reloc branch is 2.7K bigger for the MPC8548CDS.

Best,
Peter

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-08 16:15     ` Peter Tyser
@ 2009-10-08 16:50       ` J. William Campbell
  0 siblings, 0 replies; 47+ messages in thread
From: J. William Campbell @ 2009-10-08 16:50 UTC (permalink / raw)
  To: u-boot

Peter Tyser wrote:
> On Thu, 2009-10-08 at 08:53 -0700, J. William Campbell wrote:
>   
>> Peter Tyser wrote:
>>     
>>> On Thu, 2009-10-08 at 22:54 +1100, Graeme Russ wrote:
>>>   
>>>       
>>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>>>> the results (fixed width font will help - its space, not tab, formatted):
>>>>
>>>> Section             non-reloc     reloc
>>>> ---------------------------------------
>>>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
>>>> .rodata              00005bad  000059d0
>>>> .interp              n/a       00000013
>>>> .dynstr              n/a       00000648
>>>> .hash                n/a       00000428
>>>> .eh_frame            00003268  000034fc
>>>> .data                00000a6c  000001dc
>>>> .data.rel            n/a       00000098
>>>> .data.rel.ro.local   n/a       00000178
>>>> .data.rel.local      n/a       000007e4
>>>> .got                 00000000  000001f0
>>>> .got.plt             n/a       0000000c
>>>> .rel.got             n/a       000003e0
>>>> .rel.dyn             n/a       00001228
>>>> .dynsym              n/a       00000850
>>>> .dynamic             n/a       00000080
>>>> .u_boot_cmd          000003c0  000003c0
>>>> .bss                 00001a34  00001a34
>>>> .realmode            00000166  00000166
>>>> .bios                0000053e  0000053e
>>>> =======================================
>>>> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
>>>>
>>>> Its more than a 16% increase in size!!!
>>>>
>>>> .text accounts for a little under half of the total bloat, and of that,
>>>> the crude dynamic loader accounts for only 341 bytes
>>>>
>>>> Have any metrics been done for PPC?
>>>>     
>>>>         
>>> Things actually improve a little bit when we use -mrelocatable and get
>>> rid of all the manual "+= gd->reloc_off" fixups:
>>>
>>> 1) Top of mainline on XPedite5370:
>>>    text	   data	    bss	    dec	    hex	filename
>>>  308612	  24488	  33172	 366272	  596c0	u-boot
>>>
>>> 2) Top of "reloc" branch on XPedite5370 (ie -mrelocatable):
>>>    text	   data	    bss	    dec	    hex	filename
>>>  303704	  28644	  33156	 365504	  593c0	u-boot
>>>
>>>   
>>>       
>> Hi Peter,
>>      Just to be clear, the total text+data length of u-boot with the 
>> "manual" relocations (#1)  is LARGER than the text+data length of u-boot 
>> with the "manual" relocations removed and the necessary centralized 
>> relocation code added, along with any additional data sections required 
>> by -mrelocateable (#2), by 768 (dec) bytes?
>>     
>
> Hi Bill,
> Doah, looks like I chose a bad board as an example.  The XPedite5370
> already had -mrelocatable defined in its own
> board/xes/xpedite5370/config.mk in mainline, so the above comparison
> should be ignored as both builds used -mrelocatable.
>
> Here's some *real* results from the MPC8548CDS:
> 1) Top of mainline:
>    text	   data	    bss	    dec	    hex	filename
>  219968	  17052	  22992	 260012	  3f7ac	u-boot
>
> 2) Top of "reloc" branch (ie -mrelocatable)
>    text	   data	    bss	    dec	    hex	filename
>  219192	  20640	  22980	 262812	  4029c	u-boot
>
> So the reloc branch is 2.7K bigger for the MPC8548CDS.
>   
Hi Peter,
     OK, that's more like it! A 1.2 % size increase in ROM seems like a 
very small price to pay for a truly relocatable u-boot image that will 
run on any size memory without the programmer having to actively worry 
about what may need relocating as code is written. . Also, it should be 
noted that the size increase in 2)  is mostly in relocation segments 
that do not need to be copied into ram, so the ram footprint should be 
smaller for 2) than 1). The relocation code itself could also be placed 
is a segment that is not copied into ram, although that may be more 
trouble than it is worth.
       I am looking forward to Graeme's results with the 386. I expect 
that it will not be quite so favorable, perhaps a 4 or 5% size increase 
for -mrelocatable over an absolute build. However, -mrelocatable vs. 
-fpic may be comparable, with -mrelocatable actually winning. But then 
again, I could be totally wrong!

Best Regards,
Bill Campbell
> Best,
> Peter
>
>
>
>   

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-08 15:58 ` J. William Campbell
@ 2009-10-08 20:58   ` Graeme Russ
  2009-10-08 21:23     ` Wolfgang Denk
  2009-10-08 22:27     ` J. William Campbell
  0 siblings, 2 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-08 20:58 UTC (permalink / raw)
  To: u-boot

On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
<jwilliamcampbell@comcast.net> wrote:
> Graeme Russ wrote:
>>
>> Out of curiosity, I wanted to see just how much of a size penalty I am
>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>> the results (fixed width font will help - its space, not tab, formatted):
>>
>> Section             non-reloc     reloc
>> ---------------------------------------
>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
>> .rodata              00005bad  000059d0
>> .interp              n/a       00000013
>> .dynstr              n/a       00000648
>> .hash                n/a       00000428
>> .eh_frame            00003268  000034fc
>> .data                00000a6c  000001dc
>> .data.rel            n/a       00000098
>> .data.rel.ro.local   n/a       00000178
>> .data.rel.local      n/a       000007e4
>> .got                 00000000  000001f0
>> .got.plt             n/a       0000000c
>> .rel.got             n/a       000003e0
>> .rel.dyn             n/a       00001228
>> .dynsym              n/a       00000850
>> .dynamic             n/a       00000080
>> .u_boot_cmd          000003c0  000003c0
>> .bss                 00001a34  00001a34
>> .realmode            00000166  00000166
>> .bios                0000053e  0000053e
>> =======================================
>> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
>>
>> Its more than a 16% increase in size!!!
>>
>> .text accounts for a little under half of the total bloat, and of that,
>> the crude dynamic loader accounts for only 341 bytes
>>
>
> Hi Graeme,
>      I would be interested in a third option (column), the x86 build with
> just -mrelocateable but NOT -fpic. It will not be definitive because there
> will be extra code that references the GOT and missing code to do some of
> the relocation, but it would still be interesting.

x86 does not have -mrelocatable. This is a PPC only option :(


>
> Best Regards,
> Bill Campbell
>>
>> Have any metrics been done for PPC?
>>
>> Regards,
>>
>> Graeme

Once the reloc branch has been merged, how many arches are left which do
not support relocation?

Regards,

Graeme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-08 20:58   ` Graeme Russ
@ 2009-10-08 21:23     ` Wolfgang Denk
  2009-10-08 22:02       ` Graeme Russ
  2009-10-08 22:27     ` J. William Campbell
  1 sibling, 1 reply; 47+ messages in thread
From: Wolfgang Denk @ 2009-10-08 21:23 UTC (permalink / raw)
  To: u-boot

Dear Graeme Russ,

In message <d66caabb0910081358h5b013922tf7f9dce4cce41c64@mail.gmail.com> you wrote:
>
> 
> Once the reloc branch has been merged, how many arches are left which do
> not support relocation?

All but PPC ?

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
There comes to all races an ultimate crisis which  you  have  yet  to
face  ....  One  day  our  minds became so powerful we dared think of
ourselves as gods.
	-- Sargon, "Return to Tomorrow", stardate 4768.3

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-08 21:23     ` Wolfgang Denk
@ 2009-10-08 22:02       ` Graeme Russ
  2009-10-08 22:20         ` Peter Tyser
  0 siblings, 1 reply; 47+ messages in thread
From: Graeme Russ @ 2009-10-08 22:02 UTC (permalink / raw)
  To: u-boot

On Fri, Oct 9, 2009 at 8:23 AM, Wolfgang Denk <wd@denx.de> wrote:
> Dear Graeme Russ,
>
> In message <d66caabb0910081358h5b013922tf7f9dce4cce41c64@mail.gmail.com> you wrote:
>>
>>
>> Once the reloc branch has been merged, how many arches are left which do
>> not support relocation?
>
> All but PPC ?

Hmm, so commit 0630535e2d062dd73c1ceca5c6125c86d1127a49 is all about
removing code that is not used because these arches do not do any
relocation at all?

So ultimately, what we are looking at is the complete and utter
removal of any code which references a relocation adjustment in lieu
of each arch either:

  a) Execute in Place from Flash, or;
  b) Setting a fixed TEXT_BASE at a known RAM location and copying
     the contents of Flash to RAM, or;
  c) Implementing full Relocation

>
> Best regards,
>
> Wolfgang Denk
>

Regards,

Graeme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-08 22:02       ` Graeme Russ
@ 2009-10-08 22:20         ` Peter Tyser
  2009-10-09  1:25           ` Mike Frysinger
  2009-10-09  1:43           ` Graeme Russ
  0 siblings, 2 replies; 47+ messages in thread
From: Peter Tyser @ 2009-10-08 22:20 UTC (permalink / raw)
  To: u-boot

On Fri, 2009-10-09 at 09:02 +1100, Graeme Russ wrote:
> On Fri, Oct 9, 2009 at 8:23 AM, Wolfgang Denk <wd@denx.de> wrote:
> > Dear Graeme Russ,
> >
> > In message <d66caabb0910081358h5b013922tf7f9dce4cce41c64@mail.gmail.com> you wrote:
> >>
> >>
> >> Once the reloc branch has been merged, how many arches are left which do
> >> not support relocation?
> >
> > All but PPC ?
> 
> Hmm, so commit 0630535e2d062dd73c1ceca5c6125c86d1127a49 is all about
> removing code that is not used because these arches do not do any
> relocation at all?

I sent that patch/RFC after noticing none of those architectures
performed manual relocation fixups, thus they could save some code space
by defining CONFIG_RELOC_FIXUP_WORKS.  Similarly the gd->reloc_off field
was no longer needed for them.

I'm not familiar with if or how those architectures are relocating, just
that they didn't need relocation fixups.  So that was the logic...

> So ultimately, what we are looking at is the complete and utter
> removal of any code which references a relocation adjustment in lieu
> of each arch either:
> 
>   a) Execute in Place from Flash, or;
>   b) Setting a fixed TEXT_BASE at a known RAM location and copying
>      the contents of Flash to RAM, or;
>   c) Implementing full Relocation

d) Leaving those architectures the way they are now
Could be added if a,b,c won't work for some reason too.

I think it would be great to remove any manual relocation adjustments in
the long run.  This isn't strictly necessary though, as we can still
have manual relocations littering the code - its just a bit dirty and
prone to issues in the long run.

So my vote would be to shoot for c) for all arches, but I have no idea
what impact that would have on them:)

Best,
Peter

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-08 20:58   ` Graeme Russ
  2009-10-08 21:23     ` Wolfgang Denk
@ 2009-10-08 22:27     ` J. William Campbell
  2009-10-08 22:39       ` Graeme Russ
  1 sibling, 1 reply; 47+ messages in thread
From: J. William Campbell @ 2009-10-08 22:27 UTC (permalink / raw)
  To: u-boot

Graeme Russ wrote:
> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
> <jwilliamcampbell@comcast.net> wrote:
>   
>> Graeme Russ wrote:
>>     
>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>>> the results (fixed width font will help - its space, not tab, formatted):
>>>
>>> Section             non-reloc     reloc
>>> ---------------------------------------
>>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
>>> .rodata              00005bad  000059d0
>>> .interp              n/a       00000013
>>> .dynstr              n/a       00000648
>>> .hash                n/a       00000428
>>> .eh_frame            00003268  000034fc
>>> .data                00000a6c  000001dc
>>> .data.rel            n/a       00000098
>>> .data.rel.ro.local   n/a       00000178
>>> .data.rel.local      n/a       000007e4
>>> .got                 00000000  000001f0
>>> .got.plt             n/a       0000000c
>>> .rel.got             n/a       000003e0
>>> .rel.dyn             n/a       00001228
>>> .dynsym              n/a       00000850
>>> .dynamic             n/a       00000080
>>> .u_boot_cmd          000003c0  000003c0
>>> .bss                 00001a34  00001a34
>>> .realmode            00000166  00000166
>>> .bios                0000053e  0000053e
>>> =======================================
>>> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
>>>
>>> Its more than a 16% increase in size!!!
>>>
>>> .text accounts for a little under half of the total bloat, and of that,
>>> the crude dynamic loader accounts for only 341 bytes
>>>
>>>       
>> Hi Graeme,
>>      I would be interested in a third option (column), the x86 build with
>> just -mrelocateable but NOT -fpic. It will not be definitive because there
>> will be extra code that references the GOT and missing code to do some of
>> the relocation, but it would still be interesting.
>>     
>
> x86 does not have -mrelocatable. This is a PPC only option :(
>   
Hi Graeme,
           You are unfortunately correct. However, I wonder if we can 
get essentially the same result by executing the final ld step with the 
--emit-relocs switch included. This may also include some "extra" 
sections that we would want to strip out, but if it works, it could give 
all ELF-based systems a way to a relocatable u-boot.

Best Regards,
Bill Campbell
**
>
>   
>> Best Regards,
>> Bill Campbell
>>     
>>> Have any metrics been done for PPC?
>>>
>>> Regards,
>>>
>>> Graeme
>>>       
>
> Once the reloc branch has been merged, how many arches are left which do
> not support relocation?
>
> Regards,
>
> Graeme
>
>
>   

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-08 22:27     ` J. William Campbell
@ 2009-10-08 22:39       ` Graeme Russ
  2009-10-08 23:12         ` Joakim Tjernlund
  0 siblings, 1 reply; 47+ messages in thread
From: Graeme Russ @ 2009-10-08 22:39 UTC (permalink / raw)
  To: u-boot

On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
<jwilliamcampbell@comcast.net> wrote:
> Graeme Russ wrote:
>>
>> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
>> <jwilliamcampbell@comcast.net> wrote:
>>
>>>
>>> Graeme Russ wrote:
>>>
>>>>
>>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>>>> the results (fixed width font will help - its space, not tab,
>>>> formatted):
>>>>
>>>> Section             non-reloc     reloc
>>>> ---------------------------------------
>>>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
>>>> .rodata              00005bad  000059d0
>>>> .interp              n/a       00000013
>>>> .dynstr              n/a       00000648
>>>> .hash                n/a       00000428
>>>> .eh_frame            00003268  000034fc
>>>> .data                00000a6c  000001dc
>>>> .data.rel            n/a       00000098
>>>> .data.rel.ro.local   n/a       00000178
>>>> .data.rel.local      n/a       000007e4
>>>> .got                 00000000  000001f0
>>>> .got.plt             n/a       0000000c
>>>> .rel.got             n/a       000003e0
>>>> .rel.dyn             n/a       00001228
>>>> .dynsym              n/a       00000850
>>>> .dynamic             n/a       00000080
>>>> .u_boot_cmd          000003c0  000003c0
>>>> .bss                 00001a34  00001a34
>>>> .realmode            00000166  00000166
>>>> .bios                0000053e  0000053e
>>>> =======================================
>>>> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
>>>>
>>>> Its more than a 16% increase in size!!!
>>>>
>>>> .text accounts for a little under half of the total bloat, and of that,
>>>> the crude dynamic loader accounts for only 341 bytes
>>>>
>>>>
>>>
>>> Hi Graeme,
>>>     I would be interested in a third option (column), the x86 build with
>>> just -mrelocateable but NOT -fpic. It will not be definitive because
>>> there
>>> will be extra code that references the GOT and missing code to do some of
>>> the relocation, but it would still be interesting.
>>>
>>
>> x86 does not have -mrelocatable. This is a PPC only option :(
>>
>
> Hi Graeme,
>          You are unfortunately correct. However, I wonder if we can get
> essentially the same result by executing the final ld step with the
> --emit-relocs switch included. This may also include some "extra" sections
> that we would want to strip out, but if it works, it could give all
> ELF-based systems a way to a relocatable u-boot.
>

I don't think --emit-relocs is necessary with -pic. I haven't gone through
all the permutations to see if there is a smaller option, but gcc -fpic and
ld -pie creates enough information to perform relocation on the x86
platform

Regards,

Graeme


> Best Regards,
> Bill Campbell
> **
>>
>>
>>>
>>> Best Regards,
>>> Bill Campbell
>>>
>>>>
>>>> Have any metrics been done for PPC?
>>>>
>>>> Regards,
>>>>
>>>> Graeme
>>>>
>>
>> Once the reloc branch has been merged, how many arches are left which do
>> not support relocation?
>>
>> Regards,
>>
>> Graeme
>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-08 22:39       ` Graeme Russ
@ 2009-10-08 23:12         ` Joakim Tjernlund
  2009-10-09  0:09           ` J. William Campbell
  2009-10-10  4:43           ` Graeme Russ
  0 siblings, 2 replies; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-08 23:12 UTC (permalink / raw)
  To: u-boot

>
> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
> <jwilliamcampbell@comcast.net> wrote:
> > Graeme Russ wrote:
> >>
> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
> >> <jwilliamcampbell@comcast.net> wrote:
> >>
> >>>
> >>> Graeme Russ wrote:
> >>>
> >>>>
> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
> >>>> the results (fixed width font will help - its space, not tab,
> >>>> formatted):
> >>>>
> >>>> Section             non-reloc     reloc
> >>>> ---------------------------------------
> >>>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
> >>>> .rodata              00005bad  000059d0
> >>>> .interp              n/a       00000013
> >>>> .dynstr              n/a       00000648
> >>>> .hash                n/a       00000428
> >>>> .eh_frame            00003268  000034fc
> >>>> .data                00000a6c  000001dc
> >>>> .data.rel            n/a       00000098
> >>>> .data.rel.ro.local   n/a       00000178
> >>>> .data.rel.local      n/a       000007e4
> >>>> .got                 00000000  000001f0
> >>>> .got.plt             n/a       0000000c
> >>>> .rel.got             n/a       000003e0
> >>>> .rel.dyn             n/a       00001228
> >>>> .dynsym              n/a       00000850
> >>>> .dynamic             n/a       00000080
> >>>> .u_boot_cmd          000003c0  000003c0
> >>>> .bss                 00001a34  00001a34
> >>>> .realmode            00000166  00000166
> >>>> .bios                0000053e  0000053e
> >>>> =======================================
> >>>> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
> >>>>
> >>>> Its more than a 16% increase in size!!!
> >>>>
> >>>> .text accounts for a little under half of the total bloat, and of that,
> >>>> the crude dynamic loader accounts for only 341 bytes
> >>>>
> >>>>
> >>>
> >>> Hi Graeme,
> >>>     I would be interested in a third option (column), the x86 build with
> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
> >>> there
> >>> will be extra code that references the GOT and missing code to do some of
> >>> the relocation, but it would still be interesting.
> >>>
> >>
> >> x86 does not have -mrelocatable. This is a PPC only option :(
> >>
> >
> > Hi Graeme,
> >          You are unfortunately correct. However, I wonder if we can get
> > essentially the same result by executing the final ld step with the
> > --emit-relocs switch included. This may also include some "extra" sections
> > that we would want to strip out, but if it works, it could give all
> > ELF-based systems a way to a relocatable u-boot.
> >
>
> I don't think --emit-relocs is necessary with -pic. I haven't gone through
> all the permutations to see if there is a smaller option, but gcc -fpic and
> ld -pie creates enough information to perform relocation on the x86
> platform

Try -fvisibility=hidden

 Jocke

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-08 23:12         ` Joakim Tjernlund
@ 2009-10-09  0:09           ` J. William Campbell
  2009-10-10  4:43           ` Graeme Russ
  1 sibling, 0 replies; 47+ messages in thread
From: J. William Campbell @ 2009-10-09  0:09 UTC (permalink / raw)
  To: u-boot

Joakim Tjernlund wrote:
>> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
>> <jwilliamcampbell@comcast.net> wrote:
>>     
>>> Graeme Russ wrote:
>>>       
>>>> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
>>>> <jwilliamcampbell@comcast.net> wrote:
>>>>
>>>>         
>>>>> Graeme Russ wrote:
>>>>>
>>>>>           
>>>>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>>>>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>>>>>> the results (fixed width font will help - its space, not tab,
>>>>>> formatted):
>>>>>>
>>>>>> Section             non-reloc     reloc
>>>>>> ---------------------------------------
>>>>>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
>>>>>> .rodata              00005bad  000059d0
>>>>>> .interp              n/a       00000013
>>>>>> .dynstr              n/a       00000648
>>>>>> .hash                n/a       00000428
>>>>>> .eh_frame            00003268  000034fc
>>>>>> .data                00000a6c  000001dc
>>>>>> .data.rel            n/a       00000098
>>>>>> .data.rel.ro.local   n/a       00000178
>>>>>> .data.rel.local      n/a       000007e4
>>>>>> .got                 00000000  000001f0
>>>>>> .got.plt             n/a       0000000c
>>>>>> .rel.got             n/a       000003e0
>>>>>> .rel.dyn             n/a       00001228
>>>>>> .dynsym              n/a       00000850
>>>>>> .dynamic             n/a       00000080
>>>>>> .u_boot_cmd          000003c0  000003c0
>>>>>> .bss                 00001a34  00001a34
>>>>>> .realmode            00000166  00000166
>>>>>> .bios                0000053e  0000053e
>>>>>> =======================================
>>>>>> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
>>>>>>
>>>>>> Its more than a 16% increase in size!!!
>>>>>>
>>>>>> .text accounts for a little under half of the total bloat, and of that,
>>>>>> the crude dynamic loader accounts for only 341 bytes
>>>>>>
>>>>>>
>>>>>>             
>>>>> Hi Graeme,
>>>>>     I would be interested in a third option (column), the x86 build with
>>>>> just -mrelocateable but NOT -fpic. It will not be definitive because
>>>>> there
>>>>> will be extra code that references the GOT and missing code to do some of
>>>>> the relocation, but it would still be interesting.
>>>>>
>>>>>           
>>>> x86 does not have -mrelocatable. This is a PPC only option :(
>>>>
>>>>         
>>> Hi Graeme,
>>>          You are unfortunately correct. However, I wonder if we can get
>>> essentially the same result by executing the final ld step with the
>>> --emit-relocs switch included. This may also include some "extra" sections
>>> that we would want to strip out, but if it works, it could give all
>>> ELF-based systems a way to a relocatable u-boot.
>>>
>>>       
>> I don't think --emit-relocs is necessary with -pic. I haven't gone through
>> all the permutations to see if there is a smaller option, but gcc -fpic and
>> ld -pie creates enough information to perform relocation on the x86
>> platform
>>     
>
>   
It is true that --emit-relocs is not required when -pic and -pie are 
used instead. However, pic and pie are designed to allow shared code 
(libraries)  to appear at different logical addresses in several 
programs without altering the text. This is grand overkill for what we 
need, which is the ability to relocate the code. The -pic and -pie code 
will be larger than the code without pic and pie. How much larger is a 
good question. On the PPC, it is larger but not much larger, because 
there are lots of registers available and one is almost for sure got (no 
pun intended) the magic relocation constant(s) in it. On the 386 with 
many fewer registers, pic and pie will cause the code to be 
percentage-wise larger than on the PPC. Thus avoiding pic and pie is a 
Good Thing in most cases.
> Try -fvisibility=hidden
>   
I assume the -fvisibility=hidden is suggested in order to reduce 
(eliminate) the symbol table from the output, which we don't need 
because there are assumed to be no undefined symbols in our final ld. If 
that works, great! I was assuming we might need a custom "strip" program 
to delete any sections that we don't need, but this sounds easier if it 
gets them all.

Best Regards,
Bill Campbell
>  Jocke
>
>
>
>   

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-08 22:20         ` Peter Tyser
@ 2009-10-09  1:25           ` Mike Frysinger
  2009-10-09  1:43           ` Graeme Russ
  1 sibling, 0 replies; 47+ messages in thread
From: Mike Frysinger @ 2009-10-09  1:25 UTC (permalink / raw)
  To: u-boot

On Thursday 08 October 2009 18:20:18 Peter Tyser wrote:
> On Fri, 2009-10-09 at 09:02 +1100, Graeme Russ wrote:
> > On Fri, Oct 9, 2009 at 8:23 AM, Wolfgang Denk <wd@denx.de> wrote:
> > > Graeme Russ wrote:
> > >> Once the reloc branch has been merged, how many arches are left which
> > >> do not support relocation?
> > >
> > > All but PPC ?
> >
> > Hmm, so commit 0630535e2d062dd73c1ceca5c6125c86d1127a49 is all about
> > removing code that is not used because these arches do not do any
> > relocation at all?
> 
> I sent that patch/RFC after noticing none of those architectures
> performed manual relocation fixups, thus they could save some code space
> by defining CONFIG_RELOC_FIXUP_WORKS.  Similarly the gd->reloc_off field
> was no longer needed for them.
> 
> I'm not familiar with if or how those architectures are relocating, just
> that they didn't need relocation fixups.  So that was the logic...

the usage in the Blackfin port is most likely a copy & paste of existing code.  
deleting malloc_bin_reloc() from lib_blackfin/board.c and adding 
CONFIG_RELOC_FIXUP_WORKS results in a working boot.  ive never really looked 
into relocation as no one has asked for it.
-mike
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
Url : http://lists.denx.de/pipermail/u-boot/attachments/20091008/d42e7117/attachment.pgp 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-08 22:20         ` Peter Tyser
  2009-10-09  1:25           ` Mike Frysinger
@ 2009-10-09  1:43           ` Graeme Russ
  1 sibling, 0 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-09  1:43 UTC (permalink / raw)
  To: u-boot

On Fri, Oct 9, 2009 at 9:20 AM, Peter Tyser <ptyser@xes-inc.com> wrote:
> On Fri, 2009-10-09 at 09:02 +1100, Graeme Russ wrote:
>> On Fri, Oct 9, 2009 at 8:23 AM, Wolfgang Denk <wd@denx.de> wrote:
>> > Dear Graeme Russ,
>> >
>> > In message <d66caabb0910081358h5b013922tf7f9dce4cce41c64@mail.gmail.com> you wrote:
>> >>
>> >>
>> >> Once the reloc branch has been merged, how many arches are left which do
>> >> not support relocation?
>> >
>> > All but PPC ?
>>
>> Hmm, so commit 0630535e2d062dd73c1ceca5c6125c86d1127a49 is all about
>> removing code that is not used because these arches do not do any
>> relocation at all?
>
> I sent that patch/RFC after noticing none of those architectures
> performed manual relocation fixups, thus they could save some code space
> by defining CONFIG_RELOC_FIXUP_WORKS.  Similarly the gd->reloc_off field
> was no longer needed for them.

Maybe CONFIG_RELOC_NOT_IMPLEMENTED would be more descriptive

>
> I'm not familiar with if or how those architectures are relocating, just
> that they didn't need relocation fixups.  So that was the logic...
>
>> So ultimately, what we are looking at is the complete and utter
>> removal of any code which references a relocation adjustment in lieu
>> of each arch either:
>>
>>   a) Execute in Place from Flash, or;
>>   b) Setting a fixed TEXT_BASE at a known RAM location and copying
>>      the contents of Flash to RAM, or;
>>   c) Implementing full Relocation
>
> d) Leaving those architectures the way they are now
> Could be added if a,b,c won't work for some reason too.

Which is essentially either a) or b) depending on which way the arch
was implemented. For x86, it has been b) but it is going towards c)

>
> I think it would be great to remove any manual relocation adjustments in
> the long run.  This isn't strictly necessary though, as we can still
> have manual relocations littering the code - its just a bit dirty and
> prone to issues in the long run.
>
> So my vote would be to shoot for c) for all arches, but I have no idea
> what impact that would have on them:)

So the big question now is - How many arches do partial relocation
and really need gd->reloc_off

>
> Best,
> Peter
>
>

Regards,

Graeme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-08 23:12         ` Joakim Tjernlund
  2009-10-09  0:09           ` J. William Campbell
@ 2009-10-10  4:43           ` Graeme Russ
  2009-10-10  8:07             ` Joakim Tjernlund
  1 sibling, 1 reply; 47+ messages in thread
From: Graeme Russ @ 2009-10-10  4:43 UTC (permalink / raw)
  To: u-boot

On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
>>
>> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
>> <jwilliamcampbell@comcast.net> wrote:
>> > Graeme Russ wrote:
>> >>
>> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
>> >> <jwilliamcampbell@comcast.net> wrote:
>> >>
>> >>>
>> >>> Graeme Russ wrote:
>> >>>
>> >>>>
>> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>> >>>> the results (fixed width font will help - its space, not tab,
>> >>>> formatted):
>> >>>>
>> >>>> Section             non-reloc     reloc
>> >>>> ---------------------------------------
>> >>>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
>> >>>> .rodata              00005bad  000059d0
>> >>>> .interp              n/a       00000013
>> >>>> .dynstr              n/a       00000648
>> >>>> .hash                n/a       00000428
>> >>>> .eh_frame            00003268  000034fc
>> >>>> .data                00000a6c  000001dc
>> >>>> .data.rel            n/a       00000098
>> >>>> .data.rel.ro.local   n/a       00000178
>> >>>> .data.rel.local      n/a       000007e4
>> >>>> .got                 00000000  000001f0
>> >>>> .got.plt             n/a       0000000c
>> >>>> .rel.got             n/a       000003e0
>> >>>> .rel.dyn             n/a       00001228
>> >>>> .dynsym              n/a       00000850
>> >>>> .dynamic             n/a       00000080
>> >>>> .u_boot_cmd          000003c0  000003c0
>> >>>> .bss                 00001a34  00001a34
>> >>>> .realmode            00000166  00000166
>> >>>> .bios                0000053e  0000053e
>> >>>> =======================================
>> >>>> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
>> >>>>
>> >>>> Its more than a 16% increase in size!!!
>> >>>>
>> >>>> .text accounts for a little under half of the total bloat, and of that,
>> >>>> the crude dynamic loader accounts for only 341 bytes
>> >>>>
>> >>>>
>> >>>
>> >>> Hi Graeme,
>> >>>     I would be interested in a third option (column), the x86 build with
>> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
>> >>> there
>> >>> will be extra code that references the GOT and missing code to do some of
>> >>> the relocation, but it would still be interesting.
>> >>>
>> >>
>> >> x86 does not have -mrelocatable. This is a PPC only option :(
>> >>
>> >
>> > Hi Graeme,
>> >          You are unfortunately correct. However, I wonder if we can get
>> > essentially the same result by executing the final ld step with the
>> > --emit-relocs switch included. This may also include some "extra" sections
>> > that we would want to strip out, but if it works, it could give all
>> > ELF-based systems a way to a relocatable u-boot.
>> >
>>
>> I don't think --emit-relocs is necessary with -pic. I haven't gone through
>> all the permutations to see if there is a smaller option, but gcc -fpic and
>> ld -pie creates enough information to perform relocation on the x86
>> platform
>
> Try -fvisibility=hidden

Thanks - Shaved another 2539 bytes off the binary

Also found out how to get rid of .eh_frame (crept in when I upgraded to
gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes

Total saving of 15.6k

>
>  Jocke
>
>

Regards,

Graeme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-10  4:43           ` Graeme Russ
@ 2009-10-10  8:07             ` Joakim Tjernlund
  2009-10-10  8:46               ` Graeme Russ
  0 siblings, 1 reply; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-10  8:07 UTC (permalink / raw)
  To: u-boot

Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
>
> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
> <joakim.tjernlund@transmode.se> wrote:
> >>
> >> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
> >> <jwilliamcampbell@comcast.net> wrote:
> >> > Graeme Russ wrote:
> >> >>
> >> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
> >> >> <jwilliamcampbell@comcast.net> wrote:
> >> >>
> >> >>>
> >> >>> Graeme Russ wrote:
> >> >>>
> >> >>>>
> >> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
> >> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
> >> >>>> the results (fixed width font will help - its space, not tab,
> >> >>>> formatted):
> >> >>>>
> >> >>>> Section             non-reloc     reloc
> >> >>>> ---------------------------------------
> >> >>>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
> >> >>>> .rodata              00005bad  000059d0
> >> >>>> .interp              n/a       00000013
> >> >>>> .dynstr              n/a       00000648
> >> >>>> .hash                n/a       00000428
> >> >>>> .eh_frame            00003268  000034fc
> >> >>>> .data                00000a6c  000001dc
> >> >>>> .data.rel            n/a       00000098
> >> >>>> .data.rel.ro.local   n/a       00000178
> >> >>>> .data.rel.local      n/a       000007e4
> >> >>>> .got                 00000000  000001f0
> >> >>>> .got.plt             n/a       0000000c
> >> >>>> .rel.got             n/a       000003e0
> >> >>>> .rel.dyn             n/a       00001228
> >> >>>> .dynsym              n/a       00000850
> >> >>>> .dynamic             n/a       00000080
> >> >>>> .u_boot_cmd          000003c0  000003c0
> >> >>>> .bss                 00001a34  00001a34
> >> >>>> .realmode            00000166  00000166
> >> >>>> .bios                0000053e  0000053e
> >> >>>> =======================================
> >> >>>> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
> >> >>>>
> >> >>>> Its more than a 16% increase in size!!!
> >> >>>>
> >> >>>> .text accounts for a little under half of the total bloat, and of that,
> >> >>>> the crude dynamic loader accounts for only 341 bytes
> >> >>>>
> >> >>>>
> >> >>>
> >> >>> Hi Graeme,
> >> >>>     I would be interested in a third option (column), the x86 build with
> >> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
> >> >>> there
> >> >>> will be extra code that references the GOT and missing code to do some of
> >> >>> the relocation, but it would still be interesting.
> >> >>>
> >> >>
> >> >> x86 does not have -mrelocatable. This is a PPC only option :(
> >> >>
> >> >
> >> > Hi Graeme,
> >> >          You are unfortunately correct. However, I wonder if we can get
> >> > essentially the same result by executing the final ld step with the
> >> > --emit-relocs switch included. This may also include some "extra" sections
> >> > that we would want to strip out, but if it works, it could give all
> >> > ELF-based systems a way to a relocatable u-boot.
> >> >
> >>
> >> I don't think --emit-relocs is necessary with -pic. I haven't gone through
> >> all the permutations to see if there is a smaller option, but gcc -fpic and
> >> ld -pie creates enough information to perform relocation on the x86
> >> platform
> >
> > Try -fvisibility=hidden
>
> Thanks - Shaved another 2539 bytes off the binary
>
> Also found out how to get rid of .eh_frame (crept in when I upgraded to
> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
>
> Total saving of 15.6k

Great, so now you are back at just a few percent added I guess?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-10  8:07             ` Joakim Tjernlund
@ 2009-10-10  8:46               ` Graeme Russ
  2009-10-10  9:27                 ` Joakim Tjernlund
  0 siblings, 1 reply; 47+ messages in thread
From: Graeme Russ @ 2009-10-10  8:46 UTC (permalink / raw)
  To: u-boot

On Sat, Oct 10, 2009 at 7:07 PM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
> Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
>>
>> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>> >>
>> >> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
>> >> <jwilliamcampbell@comcast.net> wrote:
>> >> > Graeme Russ wrote:
>> >> >>
>> >> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
>> >> >> <jwilliamcampbell@comcast.net> wrote:
>> >> >>
>> >> >>>
>> >> >>> Graeme Russ wrote:
>> >> >>>
>> >> >>>>
>> >> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>> >> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>> >> >>>> the results (fixed width font will help - its space, not tab,
>> >> >>>> formatted):
>> >> >>>>
>> >> >>>> Section             non-reloc     reloc
>> >> >>>> ---------------------------------------
>> >> >>>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
>> >> >>>> .rodata              00005bad  000059d0
>> >> >>>> .interp              n/a       00000013
>> >> >>>> .dynstr              n/a       00000648
>> >> >>>> .hash                n/a       00000428
>> >> >>>> .eh_frame            00003268  000034fc
>> >> >>>> .data                00000a6c  000001dc
>> >> >>>> .data.rel            n/a       00000098
>> >> >>>> .data.rel.ro.local   n/a       00000178
>> >> >>>> .data.rel.local      n/a       000007e4
>> >> >>>> .got                 00000000  000001f0
>> >> >>>> .got.plt             n/a       0000000c
>> >> >>>> .rel.got             n/a       000003e0
>> >> >>>> .rel.dyn             n/a       00001228
>> >> >>>> .dynsym              n/a       00000850
>> >> >>>> .dynamic             n/a       00000080
>> >> >>>> .u_boot_cmd          000003c0  000003c0
>> >> >>>> .bss                 00001a34  00001a34
>> >> >>>> .realmode            00000166  00000166
>> >> >>>> .bios                0000053e  0000053e
>> >> >>>> =======================================
>> >> >>>> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
>> >> >>>>
>> >> >>>> Its more than a 16% increase in size!!!
>> >> >>>>
>> >> >>>> .text accounts for a little under half of the total bloat, and of that,
>> >> >>>> the crude dynamic loader accounts for only 341 bytes
>> >> >>>>
>> >> >>>>
>> >> >>>
>> >> >>> Hi Graeme,
>> >> >>>     I would be interested in a third option (column), the x86 build with
>> >> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
>> >> >>> there
>> >> >>> will be extra code that references the GOT and missing code to do some of
>> >> >>> the relocation, but it would still be interesting.
>> >> >>>
>> >> >>
>> >> >> x86 does not have -mrelocatable. This is a PPC only option :(
>> >> >>
>> >> >
>> >> > Hi Graeme,
>> >> >          You are unfortunately correct. However, I wonder if we can get
>> >> > essentially the same result by executing the final ld step with the
>> >> > --emit-relocs switch included. This may also include some "extra" sections
>> >> > that we would want to strip out, but if it works, it could give all
>> >> > ELF-based systems a way to a relocatable u-boot.
>> >> >
>> >>
>> >> I don't think --emit-relocs is necessary with -pic. I haven't gone through
>> >> all the permutations to see if there is a smaller option, but gcc -fpic and
>> >> ld -pie creates enough information to perform relocation on the x86
>> >> platform
>> >
>> > Try -fvisibility=hidden
>>
>> Thanks - Shaved another 2539 bytes off the binary
>>
>> Also found out how to get rid of .eh_frame (crept in when I upgraded to
>> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
>>
>> Total saving of 15.6k
>
> Great, so now you are back at just a few percent added I guess?
>
>

Not really - The .eh_frame saving applies to both relocated and non
relocated builds


Regards,

Graeme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-10  8:46               ` Graeme Russ
@ 2009-10-10  9:27                 ` Joakim Tjernlund
  2009-10-10 10:38                   ` Graeme Russ
  0 siblings, 1 reply; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-10  9:27 UTC (permalink / raw)
  To: u-boot



Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 10:46:52:
>
> On Sat, Oct 10, 2009 at 7:07 PM, Joakim Tjernlund
> <joakim.tjernlund@transmode.se> wrote:
> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
> >>
> >> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
> >> <joakim.tjernlund@transmode.se> wrote:
> >> >>
> >> >> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
> >> >> <jwilliamcampbell@comcast.net> wrote:
> >> >> > Graeme Russ wrote:
> >> >> >>
> >> >> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
> >> >> >> <jwilliamcampbell@comcast.net> wrote:
> >> >> >>
> >> >> >>>
> >> >> >>> Graeme Russ wrote:
> >> >> >>>
> >> >> >>>>
> >> >> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
> >> >> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
> >> >> >>>> the results (fixed width font will help - its space, not tab,
> >> >> >>>> formatted):
> >> >> >>>>
> >> >> >>>> Section             non-reloc     reloc
> >> >> >>>> ---------------------------------------
> >> >> >>>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
> >> >> >>>> .rodata              00005bad  000059d0
> >> >> >>>> .interp              n/a       00000013
> >> >> >>>> .dynstr              n/a       00000648
> >> >> >>>> .hash                n/a       00000428
> >> >> >>>> .eh_frame            00003268  000034fc
> >> >> >>>> .data                00000a6c  000001dc
> >> >> >>>> .data.rel            n/a       00000098
> >> >> >>>> .data.rel.ro.local   n/a       00000178
> >> >> >>>> .data.rel.local      n/a       000007e4
> >> >> >>>> .got                 00000000  000001f0
> >> >> >>>> .got.plt             n/a       0000000c
> >> >> >>>> .rel.got             n/a       000003e0
> >> >> >>>> .rel.dyn             n/a       00001228
> >> >> >>>> .dynsym              n/a       00000850
> >> >> >>>> .dynamic             n/a       00000080
> >> >> >>>> .u_boot_cmd          000003c0  000003c0
> >> >> >>>> .bss                 00001a34  00001a34
> >> >> >>>> .realmode            00000166  00000166
> >> >> >>>> .bios                0000053e  0000053e
> >> >> >>>> =======================================
> >> >> >>>> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
> >> >> >>>>
> >> >> >>>> Its more than a 16% increase in size!!!
> >> >> >>>>
> >> >> >>>> .text accounts for a little under half of the total bloat, and of that,
> >> >> >>>> the crude dynamic loader accounts for only 341 bytes
> >> >> >>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>> Hi Graeme,
> >> >> >>>     I would be interested in a third option (column), the x86 build with
> >> >> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
> >> >> >>> there
> >> >> >>> will be extra code that references the GOT and missing code to do some of
> >> >> >>> the relocation, but it would still be interesting.
> >> >> >>>
> >> >> >>
> >> >> >> x86 does not have -mrelocatable. This is a PPC only option :(
> >> >> >>
> >> >> >
> >> >> > Hi Graeme,
> >> >> >          You are unfortunately correct. However, I wonder if we can get
> >> >> > essentially the same result by executing the final ld step with the
> >> >> > --emit-relocs switch included. This may also include some "extra" sections
> >> >> > that we would want to strip out, but if it works, it could give all
> >> >> > ELF-based systems a way to a relocatable u-boot.
> >> >> >
> >> >>
> >> >> I don't think --emit-relocs is necessary with -pic. I haven't gone through
> >> >> all the permutations to see if there is a smaller option, but gcc -fpic and
> >> >> ld -pie creates enough information to perform relocation on the x86
> >> >> platform
> >> >
> >> > Try -fvisibility=hidden
> >>
> >> Thanks - Shaved another 2539 bytes off the binary
> >>
> >> Also found out how to get rid of .eh_frame (crept in when I upgraded to
> >> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
> >>
> >> Total saving of 15.6k
> >
> > Great, so now you are back at just a few percent added I guess?
> >
> >
>
> Not really - The .eh_frame saving applies to both relocated and non
> relocated builds

OK, so you didn't use PIC before at all?

Anyway I think you can do more. Using -Bsymbolic you should get
away with RELATIVE relocs only and be able to skip a lot of segments above.
Have a look at uClibc ldso/ldso/dl-startup.c

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-10  9:27                 ` Joakim Tjernlund
@ 2009-10-10 10:38                   ` Graeme Russ
  2009-10-10 10:47                     ` Joakim Tjernlund
  0 siblings, 1 reply; 47+ messages in thread
From: Graeme Russ @ 2009-10-10 10:38 UTC (permalink / raw)
  To: u-boot

On Sat, Oct 10, 2009 at 8:27 PM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
>
>
> Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 10:46:52:
>>
>> On Sat, Oct 10, 2009 at 7:07 PM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
>> >>
>> >> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
>> >> <joakim.tjernlund@transmode.se> wrote:
>> >> >>
>> >> >> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
>> >> >> <jwilliamcampbell@comcast.net> wrote:
>> >> >> > Graeme Russ wrote:
>> >> >> >>
>> >> >> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
>> >> >> >> <jwilliamcampbell@comcast.net> wrote:
>> >> >> >>
>> >> >> >>>
>> >> >> >>> Graeme Russ wrote:
>> >> >> >>>
>> >> >> >>>>
>> >> >> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>> >> >> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>> >> >> >>>> the results (fixed width font will help - its space, not tab,
>> >> >> >>>> formatted):
>> >> >> >>>>
>> >> >> >>>> Section             non-reloc     reloc
>> >> >> >>>> ---------------------------------------
>> >> >> >>>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
>> >> >> >>>> .rodata              00005bad  000059d0
>> >> >> >>>> .interp              n/a       00000013
>> >> >> >>>> .dynstr              n/a       00000648
>> >> >> >>>> .hash                n/a       00000428
>> >> >> >>>> .eh_frame            00003268  000034fc
>> >> >> >>>> .data                00000a6c  000001dc
>> >> >> >>>> .data.rel            n/a       00000098
>> >> >> >>>> .data.rel.ro.local   n/a       00000178
>> >> >> >>>> .data.rel.local      n/a       000007e4
>> >> >> >>>> .got                 00000000  000001f0
>> >> >> >>>> .got.plt             n/a       0000000c
>> >> >> >>>> .rel.got             n/a       000003e0
>> >> >> >>>> .rel.dyn             n/a       00001228
>> >> >> >>>> .dynsym              n/a       00000850
>> >> >> >>>> .dynamic             n/a       00000080
>> >> >> >>>> .u_boot_cmd          000003c0  000003c0
>> >> >> >>>> .bss                 00001a34  00001a34
>> >> >> >>>> .realmode            00000166  00000166
>> >> >> >>>> .bios                0000053e  0000053e
>> >> >> >>>> =======================================
>> >> >> >>>> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
>> >> >> >>>>
>> >> >> >>>> Its more than a 16% increase in size!!!
>> >> >> >>>>
>> >> >> >>>> .text accounts for a little under half of the total bloat, and of that,
>> >> >> >>>> the crude dynamic loader accounts for only 341 bytes
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>
>> >> >> >>> Hi Graeme,
>> >> >> >>>     I would be interested in a third option (column), the x86 build with
>> >> >> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
>> >> >> >>> there
>> >> >> >>> will be extra code that references the GOT and missing code to do some of
>> >> >> >>> the relocation, but it would still be interesting.
>> >> >> >>>
>> >> >> >>
>> >> >> >> x86 does not have -mrelocatable. This is a PPC only option :(
>> >> >> >>
>> >> >> >
>> >> >> > Hi Graeme,
>> >> >> >          You are unfortunately correct. However, I wonder if we can get
>> >> >> > essentially the same result by executing the final ld step with the
>> >> >> > --emit-relocs switch included. This may also include some "extra" sections
>> >> >> > that we would want to strip out, but if it works, it could give all
>> >> >> > ELF-based systems a way to a relocatable u-boot.
>> >> >> >
>> >> >>
>> >> >> I don't think --emit-relocs is necessary with -pic. I haven't gone through
>> >> >> all the permutations to see if there is a smaller option, but gcc -fpic and
>> >> >> ld -pie creates enough information to perform relocation on the x86
>> >> >> platform
>> >> >
>> >> > Try -fvisibility=hidden
>> >>
>> >> Thanks - Shaved another 2539 bytes off the binary
>> >>
>> >> Also found out how to get rid of .eh_frame (crept in when I upgraded to
>> >> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
>> >>
>> >> Total saving of 15.6k
>> >
>> > Great, so now you are back at just a few percent added I guess?
>> >
>> >
>>
>> Not really - The .eh_frame saving applies to both relocated and non
>> relocated builds
>
> OK, so you didn't use PIC before at all?
>
> Anyway I think you can do more. Using -Bsymbolic you should get
> away with RELATIVE relocs only and be able to skip a lot of segments above.
> Have a look at uClibc ldso/ldso/dl-startup.c
>
>

My build options thus far are:

PLATFORM_RELFLAGS += -fpie -fvisibility=hidden
PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
PLATFORM_LDFLAGS += -pie

-fpic / -pic make no difference

Interestingly, -Bsymbolic adds exactly 8 bytes to .dynamic, but doesn't
change the size of any other section

Pulling apart the relocation sections, it seems that all relocations are
already RELATIVE even without -Bsymbolic

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-10 10:38                   ` Graeme Russ
@ 2009-10-10 10:47                     ` Joakim Tjernlund
  2009-10-10 11:21                       ` Graeme Russ
  2009-10-10 16:52                       ` Mike Frysinger
  0 siblings, 2 replies; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-10 10:47 UTC (permalink / raw)
  To: u-boot



Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 12:38:19:
>
> On Sat, Oct 10, 2009 at 8:27 PM, Joakim Tjernlund
> <joakim.tjernlund@transmode.se> wrote:
> >
> >
> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 10:46:52:
> >>
> >> On Sat, Oct 10, 2009 at 7:07 PM, Joakim Tjernlund
> >> <joakim.tjernlund@transmode.se> wrote:
> >> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
> >> >>
> >> >> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
> >> >> <joakim.tjernlund@transmode.se> wrote:
> >> >> >>
> >> >> >> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
> >> >> >> <jwilliamcampbell@comcast.net> wrote:
> >> >> >> > Graeme Russ wrote:
> >> >> >> >>
> >> >> >> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
> >> >> >> >> <jwilliamcampbell@comcast.net> wrote:
> >> >> >> >>
> >> >> >> >>>
> >> >> >> >>> Graeme Russ wrote:
> >> >> >> >>>
> >> >> >> >>>>
> >> >> >> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
> >> >> >> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
> >> >> >> >>>> the results (fixed width font will help - its space, not tab,
> >> >> >> >>>> formatted):
> >> >> >> >>>>
> >> >> >> >>>> Section             non-reloc     reloc
> >> >> >> >>>> ---------------------------------------
> >> >> >> >>>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
> >> >> >> >>>> .rodata              00005bad  000059d0
> >> >> >> >>>> .interp              n/a       00000013
> >> >> >> >>>> .dynstr              n/a       00000648
> >> >> >> >>>> .hash                n/a       00000428
> >> >> >> >>>> .eh_frame            00003268  000034fc
> >> >> >> >>>> .data                00000a6c  000001dc
> >> >> >> >>>> .data.rel            n/a       00000098
> >> >> >> >>>> .data.rel.ro.local   n/a       00000178
> >> >> >> >>>> .data.rel.local      n/a       000007e4
> >> >> >> >>>> .got                 00000000  000001f0
> >> >> >> >>>> .got.plt             n/a       0000000c
> >> >> >> >>>> .rel.got             n/a       000003e0
> >> >> >> >>>> .rel.dyn             n/a       00001228
> >> >> >> >>>> .dynsym              n/a       00000850
> >> >> >> >>>> .dynamic             n/a       00000080
> >> >> >> >>>> .u_boot_cmd          000003c0  000003c0
> >> >> >> >>>> .bss                 00001a34  00001a34
> >> >> >> >>>> .realmode            00000166  00000166
> >> >> >> >>>> .bios                0000053e  0000053e
> >> >> >> >>>> =======================================
> >> >> >> >>>> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
> >> >> >> >>>>
> >> >> >> >>>> Its more than a 16% increase in size!!!
> >> >> >> >>>>
> >> >> >> >>>> .text accounts for a little under half of the total bloat, and of that,
> >> >> >> >>>> the crude dynamic loader accounts for only 341 bytes
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>
> >> >> >> >>> Hi Graeme,
> >> >> >> >>>     I would be interested in a third option (column), the x86 build with
> >> >> >> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
> >> >> >> >>> there
> >> >> >> >>> will be extra code that references the GOT and missing code to do some of
> >> >> >> >>> the relocation, but it would still be interesting.
> >> >> >> >>>
> >> >> >> >>
> >> >> >> >> x86 does not have -mrelocatable. This is a PPC only option :(
> >> >> >> >>
> >> >> >> >
> >> >> >> > Hi Graeme,
> >> >> >> >          You are unfortunately correct. However, I wonder if we can get
> >> >> >> > essentially the same result by executing the final ld step with the
> >> >> >> > --emit-relocs switch included. This may also include some "extra" sections
> >> >> >> > that we would want to strip out, but if it works, it could give all
> >> >> >> > ELF-based systems a way to a relocatable u-boot.
> >> >> >> >
> >> >> >>
> >> >> >> I don't think --emit-relocs is necessary with -pic. I haven't gone through
> >> >> >> all the permutations to see if there is a smaller option, but gcc -fpic and
> >> >> >> ld -pie creates enough information to perform relocation on the x86
> >> >> >> platform
> >> >> >
> >> >> > Try -fvisibility=hidden
> >> >>
> >> >> Thanks - Shaved another 2539 bytes off the binary
> >> >>
> >> >> Also found out how to get rid of .eh_frame (crept in when I upgraded to
> >> >> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
> >> >>
> >> >> Total saving of 15.6k
> >> >
> >> > Great, so now you are back at just a few percent added I guess?
> >> >
> >> >
> >>
> >> Not really - The .eh_frame saving applies to both relocated and non
> >> relocated builds
> >
> > OK, so you didn't use PIC before at all?
> >
> > Anyway I think you can do more. Using -Bsymbolic you should get
> > away with RELATIVE relocs only and be able to skip a lot of segments above.
> > Have a look at uClibc ldso/ldso/dl-startup.c
> >
> >
>
> My build options thus far are:
>
> PLATFORM_RELFLAGS += -fpie -fvisibility=hidden
> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
> PLATFORM_LDFLAGS += -pie
>
> -fpic / -pic make no difference

not on x86, on ppc it is a big difference.

>
> Interestingly, -Bsymbolic adds exactly 8 bytes to .dynamic, but doesn't
> change the size of any other section
>
> Pulling apart the relocation sections, it seems that all relocations are
> already RELATIVE even without -Bsymbolic

Ah, that is because you built an exe with -pie
Then you should be able to drop everything but the RELATIVE
from the linking, or almost in any case.

 Jocke

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-10 10:47                     ` Joakim Tjernlund
@ 2009-10-10 11:21                       ` Graeme Russ
  2009-10-10 15:38                         ` Joakim Tjernlund
       [not found]                         ` <4AD0B3D7.7020900@comcast.net>
  2009-10-10 16:52                       ` Mike Frysinger
  1 sibling, 2 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-10 11:21 UTC (permalink / raw)
  To: u-boot

On Sat, Oct 10, 2009 at 9:47 PM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
>
>
> Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 12:38:19:
>>
>> On Sat, Oct 10, 2009 at 8:27 PM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>> >
>> >
>> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 10:46:52:
>> >>
>> >> On Sat, Oct 10, 2009 at 7:07 PM, Joakim Tjernlund
>> >> <joakim.tjernlund@transmode.se> wrote:
>> >> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
>> >> >>
>> >> >> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
>> >> >> <joakim.tjernlund@transmode.se> wrote:
>> >> >> >>
>> >> >> >> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
>> >> >> >> <jwilliamcampbell@comcast.net> wrote:
>> >> >> >> > Graeme Russ wrote:
>> >> >> >> >>
>> >> >> >> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
>> >> >> >> >> <jwilliamcampbell@comcast.net> wrote:
>> >> >> >> >>
>> >> >> >> >>>
>> >> >> >> >>> Graeme Russ wrote:
>> >> >> >> >>>
>> >> >> >> >>>>
>> >> >> >> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>> >> >> >> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>> >> >> >> >>>> the results (fixed width font will help - its space, not tab,
>> >> >> >> >>>> formatted):
>> >> >> >> >>>>
>> >> >> >> >>>> Section             non-reloc     reloc
>> >> >> >> >>>> ---------------------------------------
>> >> >> >> >>>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
>> >> >> >> >>>> .rodata              00005bad  000059d0
>> >> >> >> >>>> .interp              n/a       00000013
>> >> >> >> >>>> .dynstr              n/a       00000648
>> >> >> >> >>>> .hash                n/a       00000428
>> >> >> >> >>>> .eh_frame            00003268  000034fc
>> >> >> >> >>>> .data                00000a6c  000001dc
>> >> >> >> >>>> .data.rel            n/a       00000098
>> >> >> >> >>>> .data.rel.ro.local   n/a       00000178
>> >> >> >> >>>> .data.rel.local      n/a       000007e4
>> >> >> >> >>>> .got                 00000000  000001f0
>> >> >> >> >>>> .got.plt             n/a       0000000c
>> >> >> >> >>>> .rel.got             n/a       000003e0
>> >> >> >> >>>> .rel.dyn             n/a       00001228
>> >> >> >> >>>> .dynsym              n/a       00000850
>> >> >> >> >>>> .dynamic             n/a       00000080
>> >> >> >> >>>> .u_boot_cmd          000003c0  000003c0
>> >> >> >> >>>> .bss                 00001a34  00001a34
>> >> >> >> >>>> .realmode            00000166  00000166
>> >> >> >> >>>> .bios                0000053e  0000053e
>> >> >> >> >>>> =======================================
>> >> >> >> >>>> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
>> >> >> >> >>>>
>> >> >> >> >>>> Its more than a 16% increase in size!!!
>> >> >> >> >>>>
>> >> >> >> >>>> .text accounts for a little under half of the total bloat, and of that,
>> >> >> >> >>>> the crude dynamic loader accounts for only 341 bytes
>> >> >> >> >>>>
>> >> >> >> >>>>
>> >> >> >> >>>
>> >> >> >> >>> Hi Graeme,
>> >> >> >> >>>     I would be interested in a third option (column), the x86 build with
>> >> >> >> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
>> >> >> >> >>> there
>> >> >> >> >>> will be extra code that references the GOT and missing code to do some of
>> >> >> >> >>> the relocation, but it would still be interesting.
>> >> >> >> >>>
>> >> >> >> >>
>> >> >> >> >> x86 does not have -mrelocatable. This is a PPC only option :(
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> > Hi Graeme,
>> >> >> >> >          You are unfortunately correct. However, I wonder if we can get
>> >> >> >> > essentially the same result by executing the final ld step with the
>> >> >> >> > --emit-relocs switch included. This may also include some "extra" sections
>> >> >> >> > that we would want to strip out, but if it works, it could give all
>> >> >> >> > ELF-based systems a way to a relocatable u-boot.
>> >> >> >> >
>> >> >> >>
>> >> >> >> I don't think --emit-relocs is necessary with -pic. I haven't gone through
>> >> >> >> all the permutations to see if there is a smaller option, but gcc -fpic and
>> >> >> >> ld -pie creates enough information to perform relocation on the x86
>> >> >> >> platform
>> >> >> >
>> >> >> > Try -fvisibility=hidden
>> >> >>
>> >> >> Thanks - Shaved another 2539 bytes off the binary
>> >> >>
>> >> >> Also found out how to get rid of .eh_frame (crept in when I upgraded to
>> >> >> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
>> >> >>
>> >> >> Total saving of 15.6k
>> >> >
>> >> > Great, so now you are back at just a few percent added I guess?
>> >> >
>> >> >
>> >>
>> >> Not really - The .eh_frame saving applies to both relocated and non
>> >> relocated builds
>> >
>> > OK, so you didn't use PIC before at all?
>> >
>> > Anyway I think you can do more. Using -Bsymbolic you should get
>> > away with RELATIVE relocs only and be able to skip a lot of segments above.
>> > Have a look at uClibc ldso/ldso/dl-startup.c
>> >
>> >
>>
>> My build options thus far are:
>>
>> PLATFORM_RELFLAGS += -fpie -fvisibility=hidden
>> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>> PLATFORM_LDFLAGS += -pie
>>
>> -fpic / -pic make no difference
>
> not on x86, on ppc it is a big difference.
>
>>
>> Interestingly, -Bsymbolic adds exactly 8 bytes to .dynamic, but doesn't
>> change the size of any other section
>>
>> Pulling apart the relocation sections, it seems that all relocations are
>> already RELATIVE even without -Bsymbolic
>
> Ah, that is because you built an exe with -pie
> Then you should be able to drop everything but the RELATIVE
> from the linking, or almost in any case.
>
>  Jocke
>
>

Hmm, so its seems I may have hit the limit. I tried:

PLATFORM_LDFLAGS += -r --emit-relocs

but there is not enough information left to complete the relocation. It
seems as though I need .rel.got, .got.plt, .dynsym and .rel.dyn in order
to find the actual bytes that need modifying (it also seems to mess with
the size of the stripped binary for some reason)

Looks like I'll have to proceed with my original plan - a bit bloated,
but it works

Graeme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-10 11:21                       ` Graeme Russ
@ 2009-10-10 15:38                         ` Joakim Tjernlund
  2009-10-11 10:47                           ` Graeme Russ
       [not found]                         ` <4AD0B3D7.7020900@comcast.net>
  1 sibling, 1 reply; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-10 15:38 UTC (permalink / raw)
  To: u-boot

Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 13:21:10:
>
> On Sat, Oct 10, 2009 at 9:47 PM, Joakim Tjernlund
> <joakim.tjernlund@transmode.se> wrote:
> >
> >
> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 12:38:19:
> >>
> >> On Sat, Oct 10, 2009 at 8:27 PM, Joakim Tjernlund
> >> <joakim.tjernlund@transmode.se> wrote:
> >> >
> >> >
> >> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 10:46:52:
> >> >>
> >> >> On Sat, Oct 10, 2009 at 7:07 PM, Joakim Tjernlund
> >> >> <joakim.tjernlund@transmode.se> wrote:
> >> >> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
> >> >> >>
> >> >> >> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
> >> >> >> <joakim.tjernlund@transmode.se> wrote:
> >> >> >> >>
> >> >> >> >> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
> >> >> >> >> <jwilliamcampbell@comcast.net> wrote:
> >> >> >> >> > Graeme Russ wrote:
> >> >> >> >> >>
> >> >> >> >> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
> >> >> >> >> >> <jwilliamcampbell@comcast.net> wrote:
> >> >> >> >> >>
> >> >> >> >> >>>
> >> >> >> >> >>> Graeme Russ wrote:
> >> >> >> >> >>>
> >> >> >> >> >>>>
> >> >> >> >> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
> >> >> >> >> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
> >> >> >> >> >>>> the results (fixed width font will help - its space, not tab,
> >> >> >> >> >>>> formatted):
> >> >> >> >> >>>>
> >> >> >> >> >>>> Section             non-reloc     reloc
> >> >> >> >> >>>> ---------------------------------------
> >> >> >> >> >>>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
> >> >> >> >> >>>> .rodata              00005bad  000059d0
> >> >> >> >> >>>> .interp              n/a       00000013
> >> >> >> >> >>>> .dynstr              n/a       00000648
> >> >> >> >> >>>> .hash                n/a       00000428
> >> >> >> >> >>>> .eh_frame            00003268  000034fc
> >> >> >> >> >>>> .data                00000a6c  000001dc
> >> >> >> >> >>>> .data.rel            n/a       00000098
> >> >> >> >> >>>> .data.rel.ro.local   n/a       00000178
> >> >> >> >> >>>> .data.rel.local      n/a       000007e4
> >> >> >> >> >>>> .got                 00000000  000001f0
> >> >> >> >> >>>> .got.plt             n/a       0000000c
> >> >> >> >> >>>> .rel.got             n/a       000003e0
> >> >> >> >> >>>> .rel.dyn             n/a       00001228
> >> >> >> >> >>>> .dynsym              n/a       00000850
> >> >> >> >> >>>> .dynamic             n/a       00000080
> >> >> >> >> >>>> .u_boot_cmd          000003c0  000003c0
> >> >> >> >> >>>> .bss                 00001a34  00001a34
> >> >> >> >> >>>> .realmode            00000166  00000166
> >> >> >> >> >>>> .bios                0000053e  0000053e
> >> >> >> >> >>>> =======================================
> >> >> >> >> >>>> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
> >> >> >> >> >>>>
> >> >> >> >> >>>> Its more than a 16% increase in size!!!
> >> >> >> >> >>>>
> >> >> >> >> >>>> .text accounts for a little under half of the total bloat, and of that,
> >> >> >> >> >>>> the crude dynamic loader accounts for only 341 bytes
> >> >> >> >> >>>>
> >> >> >> >> >>>>
> >> >> >> >> >>>
> >> >> >> >> >>> Hi Graeme,
> >> >> >> >> >>>     I would be interested in a third option (column), the x86 build with
> >> >> >> >> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
> >> >> >> >> >>> there
> >> >> >> >> >>> will be extra code that references the GOT and missing code todo some of
> >> >> >> >> >>> the relocation, but it would still be interesting.
> >> >> >> >> >>>
> >> >> >> >> >>
> >> >> >> >> >> x86 does not have -mrelocatable. This is a PPC only option :(
> >> >> >> >> >>
> >> >> >> >> >
> >> >> >> >> > Hi Graeme,
> >> >> >> >> >          You are unfortunately correct. However, I wonder if we can get
> >> >> >> >> > essentially the same result by executing the final ld step with the
> >> >> >> >> > --emit-relocs switch included. This may also include some "extra" sections
> >> >> >> >> > that we would want to strip out, but if it works, it could give all
> >> >> >> >> > ELF-based systems a way to a relocatable u-boot.
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >> I don't think --emit-relocs is necessary with -pic. I haven't gone through
> >> >> >> >> all the permutations to see if there is a smaller option, but gcc -fpic and
> >> >> >> >> ld -pie creates enough information to perform relocation on the x86
> >> >> >> >> platform
> >> >> >> >
> >> >> >> > Try -fvisibility=hidden
> >> >> >>
> >> >> >> Thanks - Shaved another 2539 bytes off the binary
> >> >> >>
> >> >> >> Also found out how to get rid of .eh_frame (crept in when I upgraded to
> >> >> >> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
> >> >> >>
> >> >> >> Total saving of 15.6k
> >> >> >
> >> >> > Great, so now you are back at just a few percent added I guess?
> >> >> >
> >> >> >
> >> >>
> >> >> Not really - The .eh_frame saving applies to both relocated and non
> >> >> relocated builds
> >> >
> >> > OK, so you didn't use PIC before at all?
> >> >
> >> > Anyway I think you can do more. Using -Bsymbolic you should get
> >> > away with RELATIVE relocs only and be able to skip a lot of segments above.
> >> > Have a look at uClibc ldso/ldso/dl-startup.c
> >> >
> >> >
> >>
> >> My build options thus far are:
> >>
> >> PLATFORM_RELFLAGS += -fpie -fvisibility=hidden
> >> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
> >> PLATFORM_LDFLAGS += -pie
> >>
> >> -fpic / -pic make no difference
> >
> > not on x86, on ppc it is a big difference.
> >
> >>
> >> Interestingly, -Bsymbolic adds exactly 8 bytes to .dynamic, but doesn't
> >> change the size of any other section
> >>
> >> Pulling apart the relocation sections, it seems that all relocations are
> >> already RELATIVE even without -Bsymbolic
> >
> > Ah, that is because you built an exe with -pie
> > Then you should be able to drop everything but the RELATIVE
> > from the linking, or almost in any case.
> >
> >  Jocke
> >
> >
>
> Hmm, so its seems I may have hit the limit. I tried:
>
> PLATFORM_LDFLAGS += -r --emit-relocs
>
> but there is not enough information left to complete the relocation. It
> seems as though I need .rel.got, .got.plt, .dynsym and .rel.dyn in order
> to find the actual bytes that need modifying (it also seems to mess with
> the size of the stripped binary for some reason)
>
> Looks like I'll have to proceed with my original plan - a bit bloated,
> but it works

Relocation costs :(

I am not sure why you need .got.plt, it should be empty,
what is in it?
Same with dynsym, what is in it?

Memory fails me, but since u-boot is a freestanding app it I think
these two might not be needed. Perhaps there are weak unresolved
syms in there?

     Jocke

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-10 10:47                     ` Joakim Tjernlund
  2009-10-10 11:21                       ` Graeme Russ
@ 2009-10-10 16:52                       ` Mike Frysinger
  2009-10-10 17:45                         ` Joakim Tjernlund
  1 sibling, 1 reply; 47+ messages in thread
From: Mike Frysinger @ 2009-10-10 16:52 UTC (permalink / raw)
  To: u-boot

On Saturday 10 October 2009 06:47:42 Joakim Tjernlund wrote:
> Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 12:38:19:
> > -fpic / -pic make no difference
> 
> not on x86, on ppc it is a big difference.

i think you guys mean -fpic and -fPIC because there is no -pic flag ... while 
the two make a big diff on some arches like ppc, they make pretty much no 
different on x86 last i looked
-mike
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
Url : http://lists.denx.de/pipermail/u-boot/attachments/20091010/a0bb42ee/attachment.pgp 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-10 16:52                       ` Mike Frysinger
@ 2009-10-10 17:45                         ` Joakim Tjernlund
  2009-10-11  0:43                           ` Graeme Russ
  0 siblings, 1 reply; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-10 17:45 UTC (permalink / raw)
  To: u-boot

Mike Frysinger <vapier@gentoo.org> wrote on 10/10/2009 18:52:29:
>
> On Saturday 10 October 2009 06:47:42 Joakim Tjernlund wrote:
> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 12:38:19:
> > > -fpic / -pic make no difference
> >
> > not on x86, on ppc it is a big difference.
>
> i think you guys mean -fpic and -fPIC because there is no -pic flag ... while
> the two make a big diff on some arches like ppc, they make pretty much no
> different on x86 last i looked

Yes, this was what I was thinking(-fpic vs. -fPIC). These will probably only
make a difference on RISC like arches.

 Jocke

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-10 17:45                         ` Joakim Tjernlund
@ 2009-10-11  0:43                           ` Graeme Russ
  0 siblings, 0 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-11  0:43 UTC (permalink / raw)
  To: u-boot

On Sun, Oct 11, 2009 at 4:45 AM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
> Mike Frysinger <vapier@gentoo.org> wrote on 10/10/2009 18:52:29:
>>
>> On Saturday 10 October 2009 06:47:42 Joakim Tjernlund wrote:
>> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 12:38:19:
>> > > -fpic / -pic make no difference
>> >
>> > not on x86, on ppc it is a big difference.
>>
>> i think you guys mean -fpic and -fPIC because there is no -pic flag ... while
>> the two make a big diff on some arches like ppc, they make pretty much no
>> different on x86 last i looked

Sorry for the confusion - by -fpic / -pic I was referring to -fpic (gcc) /
-pic (ld) flags versus -fpie (gcc) / -pie (ld) flags.

>
> Yes, this was what I was thinking(-fpic vs. -fPIC). These will probably only
> make a difference on RISC like arches.
>

There appears to be no difference (on x86) between pic, PIC, and pie. The
big difference is when I drop ld's -pic and use ld's --emit-relocs instead

>  Jocke
>
>

Regards,

Graeme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
       [not found]                         ` <4AD0B3D7.7020900@comcast.net>
@ 2009-10-11  1:31                           ` Graeme Russ
  0 siblings, 0 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-11  1:31 UTC (permalink / raw)
  To: u-boot

On Sun, Oct 11, 2009 at 3:18 AM, J. William Campbell
<jwilliamcampbell@comcast.net> wrote:
> Graeme Russ wrote:
>>
>> On Sat, Oct 10, 2009 at 9:47 PM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>>
>>>
>>> Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 12:38:19:
>>>
>>>>
>>>> On Sat, Oct 10, 2009 at 8:27 PM, Joakim Tjernlund
>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>
>>>>>
>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 10:46:52:
>>>>>
>>>>>>
>>>>>> On Sat, Oct 10, 2009 at 7:07 PM, Joakim Tjernlund
>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>
>>>>>>>
>>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
>>>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
>>>>>>>>>> <jwilliamcampbell@comcast.net> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Graeme Russ wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
>>>>>>>>>>>> <jwilliamcampbell@comcast.net> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Graeme Russ wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Out of curiosity, I wanted to see just how much of a size
>>>>>>>>>>>>>> penalty I am
>>>>>>>>>>>>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build.
>>>>>>>>>>>>>> Here are
>>>>>>>>>>>>>> the results (fixed width font will help - its space, not tab,
>>>>>>>>>>>>>> formatted):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Section             non-reloc     reloc
>>>>>>>>>>>>>> ---------------------------------------
>>>>>>>>>>>>>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB)
>>>>>>>>>>>>>> bigger
>>>>>>>>>>>>>> .rodata              00005bad  000059d0
>>>>>>>>>>>>>> .interp              n/a       00000013
>>>>>>>>>>>>>> .dynstr              n/a       00000648
>>>>>>>>>>>>>> .hash                n/a       00000428
>>>>>>>>>>>>>> .eh_frame            00003268  000034fc
>>>>>>>>>>>>>> .data                00000a6c  000001dc
>>>>>>>>>>>>>> .data.rel            n/a       00000098
>>>>>>>>>>>>>> .data.rel.ro.local   n/a       00000178
>>>>>>>>>>>>>> .data.rel.local      n/a       000007e4
>>>>>>>>>>>>>> .got                 00000000  000001f0
>>>>>>>>>>>>>> .got.plt             n/a       0000000c
>>>>>>>>>>>>>> .rel.got             n/a       000003e0
>>>>>>>>>>>>>> .rel.dyn             n/a       00001228
>>>>>>>>>>>>>> .dynsym              n/a       00000850
>>>>>>>>>>>>>> .dynamic             n/a       00000080
>>>>>>>>>>>>>> .u_boot_cmd          000003c0  000003c0
>>>>>>>>>>>>>> .bss                 00001a34  00001a34
>>>>>>>>>>>>>> .realmode            00000166  00000166
>>>>>>>>>>>>>> .bios                0000053e  0000053e
>>>>>>>>>>>>>> =======================================
>>>>>>>>>>>>>> Total                0001d5dd  00022287 <- 0x4caa bytes
>>>>>>>>>>>>>> (~19kB) bigger
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Its more than a 16% increase in size!!!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> .text accounts for a little under half of the total bloat, and
>>>>>>>>>>>>>> of that,
>>>>>>>>>>>>>> the crude dynamic loader accounts for only 341 bytes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Graeme,
>>>>>>>>>>>>>    I would be interested in a third option (column), the x86
>>>>>>>>>>>>> build with
>>>>>>>>>>>>> just -mrelocateable but NOT -fpic. It will not be definitive
>>>>>>>>>>>>> because
>>>>>>>>>>>>> there
>>>>>>>>>>>>> will be extra code that references the GOT and missing code to
>>>>>>>>>>>>> do some of
>>>>>>>>>>>>> the relocation, but it would still be interesting.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> x86 does not have -mrelocatable. This is a PPC only option :(
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Graeme,
>>>>>>>>>>>         You are unfortunately correct. However, I wonder if we
>>>>>>>>>>> can get
>>>>>>>>>>> essentially the same result by executing the final ld step with
>>>>>>>>>>> the
>>>>>>>>>>> --emit-relocs switch included. This may also include some "extra"
>>>>>>>>>>> sections
>>>>>>>>>>> that we would want to strip out, but if it works, it could give
>>>>>>>>>>> all
>>>>>>>>>>> ELF-based systems a way to a relocatable u-boot.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I don't think --emit-relocs is necessary with -pic. I haven't gone
>>>>>>>>>> through
>>>>>>>>>> all the permutations to see if there is a smaller option, but gcc
>>>>>>>>>> -fpic and
>>>>>>>>>> ld -pie creates enough information to perform relocation on the
>>>>>>>>>> x86
>>>>>>>>>> platform
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Try -fvisibility=hidden
>>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks - Shaved another 2539 bytes off the binary
>>>>>>>>
>>>>>>>> Also found out how to get rid of .eh_frame (crept in when I upgraded
>>>>>>>> to
>>>>>>>> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452
>>>>>>>> bytes
>>>>>>>>
>>>>>>>> Total saving of 15.6k
>>>>>>>>
>>>>>>>
>>>>>>> Great, so now you are back at just a few percent added I guess?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Not really - The .eh_frame saving applies to both relocated and non
>>>>>> relocated builds
>>>>>>
>>>>>
>>>>> OK, so you didn't use PIC before at all?
>>>>>
>>>>> Anyway I think you can do more. Using -Bsymbolic you should get
>>>>> away with RELATIVE relocs only and be able to skip a lot of segments
>>>>> above.
>>>>> Have a look at uClibc ldso/ldso/dl-startup.c
>>>>>
>>>>>
>>>>>
>>>>
>>>> My build options thus far are:
>>>>
>>>> PLATFORM_RELFLAGS += -fpie -fvisibility=hidden
>>>> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>>>> PLATFORM_LDFLAGS += -pie
>>>>
>>>> -fpic / -pic make no difference
>>>>
>>>
>>> not on x86, on ppc it is a big difference.
>>>
>>>
>>>>
>>>> Interestingly, -Bsymbolic adds exactly 8 bytes to .dynamic, but doesn't
>>>> change the size of any other section
>>>>
>>>> Pulling apart the relocation sections, it seems that all relocations are
>>>> already RELATIVE even without -Bsymbolic
>>>>
>>>
>>> Ah, that is because you built an exe with -pie
>>> Then you should be able to drop everything but the RELATIVE
>>> from the linking, or almost in any case.
>>>
>>>  Jocke
>>>
>>>
>>>
>>
>> Hmm, so its seems I may have hit the limit. I tried:
>>
>> PLATFORM_LDFLAGS += -r --emit-relocs
>>
>> but there is not enough information left to complete the relocation.
>
> Hi Graeme,
>     I am glad you tried this. It should work, -fpie should not be necessary.
> Did you also change PLATFORM_RELFLAGS to omit the -fpie? Without pie, and
> with no libraries linked in that are pie, there should BE no .got, AFIK. I
> wonder if absolutely everything is getting re-built, like maybe there is a
> library routine that is being linked in? What exactly was missing when you
> compiled and linked without pie?

I just tried with:

PLATFORM_RELFLAGS += -fvisibility=hidden
PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
PLATFORM_LDFLAGS += -Bsymbolic --emit-relocs

There is relocation information in the linker output, however it is not
marked for allocation so it gets stripped out when creating u-boot.bin

>
> Best Regards,
> Bill Campbell
>>
>>  It
>> seems as though I need .rel.got, .got.plt, .dynsym and .rel.dyn in order
>> to find the actual bytes that need modifying (it also seems to mess with
>> the size of the stripped binary for some reason)
>>
>> Looks like I'll have to proceed with my original plan - a bit bloated,
>> but it works
>>
>> Graeme
>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-10 15:38                         ` Joakim Tjernlund
@ 2009-10-11 10:47                           ` Graeme Russ
       [not found]                             ` <OF83D1271F.04B67606-ONC125764C.0045BFF2-C125764C.0046AC45@transmode.se>
  0 siblings, 1 reply; 47+ messages in thread
From: Graeme Russ @ 2009-10-11 10:47 UTC (permalink / raw)
  To: u-boot

On Sun, Oct 11, 2009 at 2:38 AM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
> Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 13:21:10:
>>
>> On Sat, Oct 10, 2009 at 9:47 PM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>> >
>> >
>> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 12:38:19:
>> >>
>> >> On Sat, Oct 10, 2009 at 8:27 PM, Joakim Tjernlund
>> >> <joakim.tjernlund@transmode.se> wrote:
>> >> >
>> >> >
>> >> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 10:46:52:
>> >> >>
>> >> >> On Sat, Oct 10, 2009 at 7:07 PM, Joakim Tjernlund
>> >> >> <joakim.tjernlund@transmode.se> wrote:
>> >> >> > Graeme Russ <graeme.russ@gmail.com> wrote on 10/10/2009 06:43:52:
>> >> >> >>
>> >> >> >> On Fri, Oct 9, 2009 at 10:12 AM, Joakim Tjernlund
>> >> >> >> <joakim.tjernlund@transmode.se> wrote:
>> >> >> >> >>
>> >> >> >> >> On Fri, Oct 9, 2009 at 9:27 AM, J. William Campbell
>> >> >> >> >> <jwilliamcampbell@comcast.net> wrote:
>> >> >> >> >> > Graeme Russ wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
>> >> >> >> >> >> <jwilliamcampbell@comcast.net> wrote:
>> >> >> >> >> >>
>> >> >> >> >> >>>
>> >> >> >> >> >>> Graeme Russ wrote:
>> >> >> >> >> >>>
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> Out of curiosity, I wanted to see just how much of a size penalty I am
>> >> >> >> >> >>>> incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
>> >> >> >> >> >>>> the results (fixed width font will help - its space, not tab,
>> >> >> >> >> >>>> formatted):
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> Section             non-reloc     reloc
>> >> >> >> >> >>>> ---------------------------------------
>> >> >> >> >> >>>> .text                000118c4  000137fc <- 0x1f38 bytes (~8kB) bigger
>> >> >> >> >> >>>> .rodata              00005bad  000059d0
>> >> >> >> >> >>>> .interp              n/a       00000013
>> >> >> >> >> >>>> .dynstr              n/a       00000648
>> >> >> >> >> >>>> .hash                n/a       00000428
>> >> >> >> >> >>>> .eh_frame            00003268  000034fc
>> >> >> >> >> >>>> .data                00000a6c  000001dc
>> >> >> >> >> >>>> .data.rel            n/a       00000098
>> >> >> >> >> >>>> .data.rel.ro.local   n/a       00000178
>> >> >> >> >> >>>> .data.rel.local      n/a       000007e4
>> >> >> >> >> >>>> .got                 00000000  000001f0
>> >> >> >> >> >>>> .got.plt             n/a       0000000c
>> >> >> >> >> >>>> .rel.got             n/a       000003e0
>> >> >> >> >> >>>> .rel.dyn             n/a       00001228
>> >> >> >> >> >>>> .dynsym              n/a       00000850
>> >> >> >> >> >>>> .dynamic             n/a       00000080
>> >> >> >> >> >>>> .u_boot_cmd          000003c0  000003c0
>> >> >> >> >> >>>> .bss                 00001a34  00001a34
>> >> >> >> >> >>>> .realmode            00000166  00000166
>> >> >> >> >> >>>> .bios                0000053e  0000053e
>> >> >> >> >> >>>> =======================================
>> >> >> >> >> >>>> Total                0001d5dd  00022287 <- 0x4caa bytes (~19kB) bigger
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> Its more than a 16% increase in size!!!
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> .text accounts for a little under half of the total bloat, and of that,
>> >> >> >> >> >>>> the crude dynamic loader accounts for only 341 bytes
>> >> >> >> >> >>>>
>> >> >> >> >> >>>>
>> >> >> >> >> >>>
>> >> >> >> >> >>> Hi Graeme,
>> >> >> >> >> >>>     I would be interested in a third option (column), the x86 build with
>> >> >> >> >> >>> just -mrelocateable but NOT -fpic. It will not be definitive because
>> >> >> >> >> >>> there
>> >> >> >> >> >>> will be extra code that references the GOT and missing code todo some of
>> >> >> >> >> >>> the relocation, but it would still be interesting.
>> >> >> >> >> >>>
>> >> >> >> >> >>
>> >> >> >> >> >> x86 does not have -mrelocatable. This is a PPC only option :(
>> >> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> > Hi Graeme,
>> >> >> >> >> >          You are unfortunately correct. However, I wonder if we can get
>> >> >> >> >> > essentially the same result by executing the final ld step with the
>> >> >> >> >> > --emit-relocs switch included. This may also include some "extra" sections
>> >> >> >> >> > that we would want to strip out, but if it works, it could give all
>> >> >> >> >> > ELF-based systems a way to a relocatable u-boot.
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> I don't think --emit-relocs is necessary with -pic. I haven't gone through
>> >> >> >> >> all the permutations to see if there is a smaller option, but gcc -fpic and
>> >> >> >> >> ld -pie creates enough information to perform relocation on the x86
>> >> >> >> >> platform
>> >> >> >> >
>> >> >> >> > Try -fvisibility=hidden
>> >> >> >>
>> >> >> >> Thanks - Shaved another 2539 bytes off the binary
>> >> >> >>
>> >> >> >> Also found out how to get rid of .eh_frame (crept in when I upgraded to
>> >> >> >> gcc 4.4.1) with -fno-dwarf2-cfi-asm, so that shaves another 13452 bytes
>> >> >> >>
>> >> >> >> Total saving of 15.6k
>> >> >> >
>> >> >> > Great, so now you are back at just a few percent added I guess?
>> >> >> >
>> >> >> >
>> >> >>
>> >> >> Not really - The .eh_frame saving applies to both relocated and non
>> >> >> relocated builds
>> >> >
>> >> > OK, so you didn't use PIC before at all?
>> >> >
>> >> > Anyway I think you can do more. Using -Bsymbolic you should get
>> >> > away with RELATIVE relocs only and be able to skip a lot of segments above.
>> >> > Have a look at uClibc ldso/ldso/dl-startup.c
>> >> >
>> >> >
>> >>
>> >> My build options thus far are:
>> >>
>> >> PLATFORM_RELFLAGS += -fpie -fvisibility=hidden
>> >> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>> >> PLATFORM_LDFLAGS += -pie
>> >>
>> >> -fpic / -pic make no difference
>> >
>> > not on x86, on ppc it is a big difference.
>> >
>> >>
>> >> Interestingly, -Bsymbolic adds exactly 8 bytes to .dynamic, but doesn't
>> >> change the size of any other section
>> >>
>> >> Pulling apart the relocation sections, it seems that all relocations are
>> >> already RELATIVE even without -Bsymbolic
>> >
>> > Ah, that is because you built an exe with -pie
>> > Then you should be able to drop everything but the RELATIVE
>> > from the linking, or almost in any case.
>> >
>> >  Jocke
>> >
>> >
>>
>> Hmm, so its seems I may have hit the limit. I tried:
>>
>> PLATFORM_LDFLAGS += -r --emit-relocs
>>
>> but there is not enough information left to complete the relocation. It
>> seems as though I need .rel.got, .got.plt, .dynsym and .rel.dyn in order
>> to find the actual bytes that need modifying (it also seems to mess with
>> the size of the stripped binary for some reason)
>>
>> Looks like I'll have to proceed with my original plan - a bit bloated,
>> but it works
>
> Relocation costs :(
>
> I am not sure why you need .got.plt, it should be empty,
> what is in it?
> Same with dynsym, what is in it?
>
> Memory fails me, but since u-boot is a freestanding app it I think
> these two might not be needed. Perhaps there are weak unresolved
> syms in there?
>
>     Jocke
>
>

Well, I'm in the middle of a pretty intense analysis of what is going on.

Compile flags are:

PLATFORM_RELFLAGS += -fpic -fvisibility=hidden
PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
PLATFORM_LDFLAGS += -pic -Bsymbolic

So far I have found that the only sections that have changes as a result
of a change in TEXT_BASE are:
    .text
    .rodata
    .data.rel
    .got
    .got.plt
    .rel.text
    .rel.got
    .rel.dyn
    .dynsym
    .dynamic
    .u_boot_cmd

Changes in .text are covered by .rel.text (see below) or as a result of
CONFIG_SYS_MONITOR_BASE being equal to TEXT_BASE (used in cfi_flash.c)

Changes in .rodata are a result of version_string changing for each
compile

  .rel.text
    - Contains a list of pointers into .got
    - All entries are R_386_RELATIVE
    - All entries (8 of) are in cpu/i386/start.o
    - cpu/i386/start.o only used during initial bootstrap - not needed
      after execution starts in RAM
    - Can be safely discarded

  .rel.got
    - Contains a list of pointers into .got
    - All entries are R_386_RELATIVE
    - Not all entries change with TEXT_BASE. Some entries are symbols
      exported from the linker script (in particular section size
      exports) while the others are in the somewhat 'special' BIOS and
      Real Mode sections which are located in a fixed RAM location (these
      sections are used for real-mode trampolining into Linux by providing
      a limited PC 'BIOS'
    - All entries that are not linked to TEXT_BASE are easily identified
      because they are 'located' below TEXT_BASE (specically between
      0x00000000 and 0x00001A34)
    - This section is not needed in the final binary - Direct processing
      of .got will achieve the required end result

  .rel.dyn
    - Contains a list of pointers into .data.rel and .u_boot_cmd
    - Like .rel.got, not all entries in .data.rel need relocating. Again,
      like .rel.got, these are easily identified
    - This section not needed

Another 5.5k saved

So, all that is left are .dynsym and .dynamic ...
  .dynsym
    - Contains 70 entries (16 bytes each, 1120 bytes)
    - 44 entries mimic those entries in .got which are not relocated
    - 21 entries are the remaining symbols exported from the linker
      script
    - 4 entries are labels defined in inline asm and used in C
    - 1 entry is a NULL entry

  .dynamic
    - 88 bytes
    - Array of Elf32_Dyn
    - typedef struct {
          Elf32_Sword     d_tag;
          union {
              Elf32_Word  d_val;
              Elf32_Addr  d_ptr;
          } d_un;
      } Elf32_Dyn;
    - 0x11 entries
      [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
      [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
      [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
      [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
      [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
      [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
      [06] 0x00000015, 0x00000000 DT_DEBUG, ???
      [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
      [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
      [09] 0x00000013, 0x00000008 DT_RELENT, ???
      [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
      [0b] 0x6FFFFFFA, 0x00000236 ???, Entries in .rel.dyn
      [0c] 0x00000000, 0x00000000 DT_NULL, End of Array
      [0d] 0x00000000, 0x00000000 DT_NULL, End of Array
      [0e] 0x00000000, 0x00000000 DT_NULL, End of Array
      [0f] 0x00000000, 0x00000000 DT_NULL, End of Array
      [10] 0x00000000, 0x00000000 DT_NULL, End of Array

I think some more investigation into the need for .dynsym and .dynamic is
still required...

Regards,

Graeme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
       [not found]                             ` <OF83D1271F.04B67606-ONC125764C.0045BFF2-C125764C.0046AC45@transmode.se>
@ 2009-10-13 11:21                               ` Graeme Russ
  2009-10-13 11:53                                 ` Joakim Tjernlund
  0 siblings, 1 reply; 47+ messages in thread
From: Graeme Russ @ 2009-10-13 11:21 UTC (permalink / raw)
  To: u-boot

On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:

[Massive Snip :)]

>
>>
>> So, all that is left are .dynsym and .dynamic ...
>>   .dynsym
>>     - Contains 70 entries (16 bytes each, 1120 bytes)
>>     - 44 entries mimic those entries in .got which are not relocated
>>     - 21 entries are the remaining symbols exported from the linker
>>       script
>>     - 4 entries are labels defined in inline asm and used in C
> Try adding proper asm declarations. Look at what gcc
> generates for a function/variable and mimic these.

Thanks - Now .dynsym contains only exports from the linker script

>
>>     - 1 entry is a NULL entry
>>
>>   .dynamic
>>     - 88 bytes
>>     - Array of Elf32_Dyn
>>     - typedef struct {
>>           Elf32_Sword     d_tag;
>>           union {
>>               Elf32_Word  d_val;
>>               Elf32_Addr  d_ptr;
>>           } d_un;
>>       } Elf32_Dyn;
>>     - 0x11 entries
>>       [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
>>       [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
>>       [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
>>       [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
>>       [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
>>       [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
>>       [06] 0x00000015, 0x00000000 DT_DEBUG, ???
>>       [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
>>       [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
> How big DT_REL is
>>       [09] 0x00000013, 0x00000008 DT_RELENT, ???
> hmm, cannot remeber :)

How big an entry in DT_REL is

>>       [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
> Oops, you got text relocations. This is generally a bad thing.
> TEXTREL is commonly caused by asm code that arent truly pic so it needs
> to modify the .text segment to adjust for relocation.
> You should get rid of this one. Look for DT_TEXTREL in .o files to find
> the culprit.
>

Alas I cannot - The relocations are a result of loading a register with a
return address when calling show_boot_progress in the very early stages of
initialisation prior to the stack becoming available. The x86 does not
allow direct access to the IP so the only way to find the 'current
execution address' is to 'call' to the next instruction and pop the return
address off the stack

This is not a problem because this is very low-level init that is not
called once relocated into RAM - These relocations can be safely ignored

>>       [0b] 0x6FFFFFFA, 0x00000236 ???, Entries in .rel.dyn
>>       [0c] 0x00000000, 0x00000000 DT_NULL, End of Array
>>       [0d] 0x00000000, 0x00000000 DT_NULL, End of Array
>>       [0e] 0x00000000, 0x00000000 DT_NULL, End of Array
>>       [0f] 0x00000000, 0x00000000 DT_NULL, End of Array
>>       [10] 0x00000000, 0x00000000 DT_NULL, End of Array
>>
>> I think some more investigation into the need for .dynsym and .dynamic is
>> still required...

.dynsym may still be required if only for accessing the __u_boot_cmd
structure. However, I may be able to hack that a little and not create a
__u_boot_cmd symbol in the linker script (create some other temporary
symbol) and populate __u_boot_cmd with a valid value after relocation. It
will look a little weird, but may mean not loading this section into RAM

Other than that, .dynsym is now only needed to locate the sections during
the relocation phase and can be kept in flash and not copied to RAM

I don't think .dynamic is needed due to the exporting of section addresses
from the linker script

Regards,

Graeme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-13 11:21                               ` Graeme Russ
@ 2009-10-13 11:53                                 ` Joakim Tjernlund
  2009-10-13 16:30                                   ` J. William Campbell
  2009-10-13 20:06                                   ` Graeme Russ
  0 siblings, 2 replies; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-13 11:53 UTC (permalink / raw)
  To: u-boot

Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
> <joakim.tjernlund@transmode.se> wrote:
> > Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>
> [Massive Snip :)]
>
> >
> >>
> >> So, all that is left are .dynsym and .dynamic ...
> >>   .dynsym
> >>     - Contains 70 entries (16 bytes each, 1120 bytes)
> >>     - 44 entries mimic those entries in .got which are not relocated
> >>     - 21 entries are the remaining symbols exported from the linker
> >>       script
> >>     - 4 entries are labels defined in inline asm and used in C
> > Try adding proper asm declarations. Look at what gcc
> > generates for a function/variable and mimic these.
>
> Thanks - Now .dynsym contains only exports from the linker script
:)
>
> >
> >>     - 1 entry is a NULL entry
> >>
> >>   .dynamic
> >>     - 88 bytes
> >>     - Array of Elf32_Dyn
> >>     - typedef struct {
> >>           Elf32_Sword     d_tag;
> >>           union {
> >>               Elf32_Word  d_val;
> >>               Elf32_Addr  d_ptr;
> >>           } d_un;
> >>       } Elf32_Dyn;
> >>     - 0x11 entries
> >>       [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
> >>       [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
> >>       [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
> >>       [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
> >>       [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
> >>       [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
> >>       [06] 0x00000015, 0x00000000 DT_DEBUG, ???
> >>       [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
> >>       [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
> > How big DT_REL is
> >>       [09] 0x00000013, 0x00000008 DT_RELENT, ???
> > hmm, cannot remeber :)
>
> How big an entry in DT_REL is

Right, how could I forget :)
>
> >>       [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
> > Oops, you got text relocations. This is generally a bad thing.
> > TEXTREL is commonly caused by asm code that arent truly pic so it needs
> > to modify the .text segment to adjust for relocation.
> > You should get rid of this one. Look for DT_TEXTREL in .o files to find
> > the culprit.
> >
>
> Alas I cannot - The relocations are a result of loading a register with a
> return address when calling show_boot_progress in the very early stages of
> initialisation prior to the stack becoming available. The x86 does not
> allow direct access to the IP so the only way to find the 'current
> execution address' is to 'call' to the next instruction and pop the return
> address off the stack

hmm, same as ppc but that in it self should not cause a TEXREL, should it?
Ahh, the 'call' is absolute, not relative? I guess there is some way around it
but it is not important ATM I guess.

Evil idea, skip -fpic et. all and add the full reloc procedure
to relocate by rewriting directly in TEXT segment. Then you save space
but you need more relocation code. Something like dl_do_reloc from
uClibc. Wonder how much extra code that would be? Not too much I think.

>
> This is not a problem because this is very low-level init that is not
> called once relocated into RAM - These relocations can be safely ignored

>
> >>       [0b] 0x6FFFFFFA, 0x00000236 ???, Entries in .rel.dyn
> >>       [0c] 0x00000000, 0x00000000 DT_NULL, End of Array
> >>       [0d] 0x00000000, 0x00000000 DT_NULL, End of Array
> >>       [0e] 0x00000000, 0x00000000 DT_NULL, End of Array
> >>       [0f] 0x00000000, 0x00000000 DT_NULL, End of Array
> >>       [10] 0x00000000, 0x00000000 DT_NULL, End of Array
> >>
> >> I think some more investigation into the need for .dynsym and .dynamic is
> >> still required...
>
> .dynsym may still be required if only for accessing the __u_boot_cmd
> structure. However, I may be able to hack that a little and not create a
> __u_boot_cmd symbol in the linker script (create some other temporary
> symbol) and populate __u_boot_cmd with a valid value after relocation. It
> will look a little weird, but may mean not loading this section into RAM

Why do you need to much around with u_boot_cmd at all? Now that relocation
works you should be able to drop all that code/linker stuff?

>
> Other than that, .dynsym is now only needed to locate the sections during
> the relocation phase and can be kept in flash and not copied to RAM

Still occupies space in the *bin image though.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-13 11:53                                 ` Joakim Tjernlund
@ 2009-10-13 16:30                                   ` J. William Campbell
  2009-10-13 16:55                                     ` Joakim Tjernlund
  2009-10-13 20:06                                   ` Graeme Russ
  1 sibling, 1 reply; 47+ messages in thread
From: J. William Campbell @ 2009-10-13 16:30 UTC (permalink / raw)
  To: u-boot

Joakim Tjernlund wrote:
> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
>   
>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>>     
>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>>>       
>> [Massive Snip :)]
>>
>>     
>>>> So, all that is left are .dynsym and .dynamic ...
>>>>   .dynsym
>>>>     - Contains 70 entries (16 bytes each, 1120 bytes)
>>>>     - 44 entries mimic those entries in .got which are not relocated
>>>>     - 21 entries are the remaining symbols exported from the linker
>>>>       script
>>>>     - 4 entries are labels defined in inline asm and used in C
>>>>         
>>> Try adding proper asm declarations. Look at what gcc
>>> generates for a function/variable and mimic these.
>>>       
>> Thanks - Now .dynsym contains only exports from the linker script
>>     
> :)
>   
>>>>     - 1 entry is a NULL entry
>>>>
>>>>   .dynamic
>>>>     - 88 bytes
>>>>     - Array of Elf32_Dyn
>>>>     - typedef struct {
>>>>           Elf32_Sword     d_tag;
>>>>           union {
>>>>               Elf32_Word  d_val;
>>>>               Elf32_Addr  d_ptr;
>>>>           } d_un;
>>>>       } Elf32_Dyn;
>>>>     - 0x11 entries
>>>>       [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
>>>>       [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
>>>>       [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
>>>>       [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
>>>>       [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
>>>>       [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
>>>>       [06] 0x00000015, 0x00000000 DT_DEBUG, ???
>>>>       [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
>>>>       [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
>>>>         
>>> How big DT_REL is
>>>       
>>>>       [09] 0x00000013, 0x00000008 DT_RELENT, ???
>>>>         
>>> hmm, cannot remeber :)
>>>       
>> How big an entry in DT_REL is
>>     
>
> Right, how could I forget :)
>   
>>>>       [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
>>>>         
>>> Oops, you got text relocations. This is generally a bad thing.
>>> TEXTREL is commonly caused by asm code that arent truly pic so it needs
>>> to modify the .text segment to adjust for relocation.
>>> You should get rid of this one. Look for DT_TEXTREL in .o files to find
>>> the culprit.
>>>
>>>       
>> Alas I cannot - The relocations are a result of loading a register with a
>> return address when calling show_boot_progress in the very early stages of
>> initialisation prior to the stack becoming available. The x86 does not
>> allow direct access to the IP so the only way to find the 'current
>> execution address' is to 'call' to the next instruction and pop the return
>> address off the stack
>>     
>
> hmm, same as ppc but that in it self should not cause a TEXREL, should it?
> Ahh, the 'call' is absolute, not relative? I guess there is some way around it
> but it is not important ATM I guess.
>
> Evil idea, skip -fpic et. all and add the full reloc procedure
> to relocate by rewriting directly in TEXT segment. Then you save space
> but you need more relocation code. Something like dl_do_reloc from
> uClibc. Wonder how much extra code that would be? Not too much I think.
>   
I think this approach will turn out to be a big win. At present, the 
problem with just using the relocs is that objcopy is stripping them out 
when u-boot.bin is created, as I understand it. It seems this can be 
solved by changing the command switches appropriately, like using 
--strip-unneeded. In any case, there is some combination of switches 
that will preserve the relocation data. The executable code will get 
smaller, there will be no .got, and the relocation data will be larger 
(than with -fpic). In total size, it probably will be slightly smaller, 
but that is a guess. The most important benefit of this approach is that 
it will work for all architectures, thereby solving the problem once and 
forever! Even if the result is a bit larger, the RAM footprint will be 
reduced by the smaller object code size (since the relocation data need 
not be copied into ram).Having this approach as an option would be real 
nice, since it would always "just work".

Best Regards,
Bill Campbell
>   
>> This is not a problem because this is very low-level init that is not
>> called once relocated into RAM - These relocations can be safely ignored
>>     
>
>   
>>>>       [0b] 0x6FFFFFFA, 0x00000236 ???, Entries in .rel.dyn
>>>>       [0c] 0x00000000, 0x00000000 DT_NULL, End of Array
>>>>       [0d] 0x00000000, 0x00000000 DT_NULL, End of Array
>>>>       [0e] 0x00000000, 0x00000000 DT_NULL, End of Array
>>>>       [0f] 0x00000000, 0x00000000 DT_NULL, End of Array
>>>>       [10] 0x00000000, 0x00000000 DT_NULL, End of Array
>>>>
>>>> I think some more investigation into the need for .dynsym and .dynamic is
>>>> still required...
>>>>         
>> .dynsym may still be required if only for accessing the __u_boot_cmd
>> structure. However, I may be able to hack that a little and not create a
>> __u_boot_cmd symbol in the linker script (create some other temporary
>> symbol) and populate __u_boot_cmd with a valid value after relocation. It
>> will look a little weird, but may mean not loading this section into RAM
>>     
>
> Why do you need to much around with u_boot_cmd at all? Now that relocation
> works you should be able to drop all that code/linker stuff?
>
>   
>> Other than that, .dynsym is now only needed to locate the sections during
>> the relocation phase and can be kept in flash and not copied to RAM
>>     
>
> Still occupies space in the *bin image though.
>
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
>
>
>   

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-13 16:30                                   ` J. William Campbell
@ 2009-10-13 16:55                                     ` Joakim Tjernlund
  0 siblings, 0 replies; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-13 16:55 UTC (permalink / raw)
  To: u-boot

"J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 13/10/2009 18:30:43:
>
> Joakim Tjernlund wrote:
> > Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
> >
> >> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
> >> <joakim.tjernlund@transmode.se> wrote:
> >>
> >>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
> >>>
> >> [Massive Snip :)]
> >>
> >>
> >>>> So, all that is left are .dynsym and .dynamic ...
> >>>>   .dynsym
> >>>>     - Contains 70 entries (16 bytes each, 1120 bytes)
> >>>>     - 44 entries mimic those entries in .got which are not relocated
> >>>>     - 21 entries are the remaining symbols exported from the linker
> >>>>       script
> >>>>     - 4 entries are labels defined in inline asm and used in C
> >>>>
> >>> Try adding proper asm declarations. Look at what gcc
> >>> generates for a function/variable and mimic these.
> >>>
> >> Thanks - Now .dynsym contains only exports from the linker script
> >>
> > :)
> >
> >>>>     - 1 entry is a NULL entry
> >>>>
> >>>>   .dynamic
> >>>>     - 88 bytes
> >>>>     - Array of Elf32_Dyn
> >>>>     - typedef struct {
> >>>>           Elf32_Sword     d_tag;
> >>>>           union {
> >>>>               Elf32_Word  d_val;
> >>>>               Elf32_Addr  d_ptr;
> >>>>           } d_un;
> >>>>       } Elf32_Dyn;
> >>>>     - 0x11 entries
> >>>>       [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
> >>>>       [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
> >>>>       [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
> >>>>       [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
> >>>>       [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
> >>>>       [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
> >>>>       [06] 0x00000015, 0x00000000 DT_DEBUG, ???
> >>>>       [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
> >>>>       [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
> >>>>
> >>> How big DT_REL is
> >>>
> >>>>       [09] 0x00000013, 0x00000008 DT_RELENT, ???
> >>>>
> >>> hmm, cannot remeber :)
> >>>
> >> How big an entry in DT_REL is
> >>
> >
> > Right, how could I forget :)
> >
> >>>>       [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
> >>>>
> >>> Oops, you got text relocations. This is generally a bad thing.
> >>> TEXTREL is commonly caused by asm code that arent truly pic so it needs
> >>> to modify the .text segment to adjust for relocation.
> >>> You should get rid of this one. Look for DT_TEXTREL in .o files to find
> >>> the culprit.
> >>>
> >>>
> >> Alas I cannot - The relocations are a result of loading a register with a
> >> return address when calling show_boot_progress in the very early stages of
> >> initialisation prior to the stack becoming available. The x86 does not
> >> allow direct access to the IP so the only way to find the 'current
> >> execution address' is to 'call' to the next instruction and pop the return
> >> address off the stack
> >>
> >
> > hmm, same as ppc but that in it self should not cause a TEXREL, should it?
> > Ahh, the 'call' is absolute, not relative? I guess there is some way around it
> > but it is not important ATM I guess.
> >
> > Evil idea, skip -fpic et. all and add the full reloc procedure
> > to relocate by rewriting directly in TEXT segment. Then you save space
> > but you need more relocation code. Something like dl_do_reloc from
> > uClibc. Wonder how much extra code that would be? Not too much I think.
> >
> I think this approach will turn out to be a big win. At present, the
> problem with just using the relocs is that objcopy is stripping them out
> when u-boot.bin is created, as I understand it. It seems this can be
> solved by changing the command switches appropriately, like using
> --strip-unneeded. In any case, there is some combination of switches
> that will preserve the relocation data. The executable code will get
> smaller, there will be no .got, and the relocation data will be larger
> (than with -fpic). In total size, it probably will be slightly smaller,
> but that is a guess. The most important benefit of this approach is that
> it will work for all architectures, thereby solving the problem once and
> forever! Even if the result is a bit larger, the RAM footprint will be
> reduced by the smaller object code size (since the relocation data need
> not be copied into ram).Having this approach as an option would be real
> nice, since it would always "just work".

Yes, I had this in the back of my head. I do think some other arch than ppc
will have to try this out though :)
I am not 100% sure this will work with my end goal, true PIC so I can load
the same img anywhere in flash.

 Jocke

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-13 11:53                                 ` Joakim Tjernlund
  2009-10-13 16:30                                   ` J. William Campbell
@ 2009-10-13 20:06                                   ` Graeme Russ
       [not found]                                     ` <OF32A18F38.511FF11C-ONC125764E.00750716-C125764E.007534EE@ <4AD511E4.9090204@comcast.net>
  2009-10-13 21:20                                     ` Joakim Tjernlund
  1 sibling, 2 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-13 20:06 UTC (permalink / raw)
  To: u-boot

On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>> > Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>>
>> [Massive Snip :)]
>>
>> >
>> >>
>> >> So, all that is left are .dynsym and .dynamic ...
>> >>   .dynsym
>> >>     - Contains 70 entries (16 bytes each, 1120 bytes)
>> >>     - 44 entries mimic those entries in .got which are not relocated
>> >>     - 21 entries are the remaining symbols exported from the linker
>> >>       script
>> >>     - 4 entries are labels defined in inline asm and used in C
>> > Try adding proper asm declarations. Look at what gcc
>> > generates for a function/variable and mimic these.
>>
>> Thanks - Now .dynsym contains only exports from the linker script
> :)
>>
>> >
>> >>     - 1 entry is a NULL entry
>> >>
>> >>   .dynamic
>> >>     - 88 bytes
>> >>     - Array of Elf32_Dyn
>> >>     - typedef struct {
>> >>           Elf32_Sword     d_tag;
>> >>           union {
>> >>               Elf32_Word  d_val;
>> >>               Elf32_Addr  d_ptr;
>> >>           } d_un;
>> >>       } Elf32_Dyn;
>> >>     - 0x11 entries
>> >>       [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
>> >>       [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
>> >>       [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
>> >>       [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
>> >>       [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
>> >>       [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
>> >>       [06] 0x00000015, 0x00000000 DT_DEBUG, ???
>> >>       [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
>> >>       [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
>> > How big DT_REL is
>> >>       [09] 0x00000013, 0x00000008 DT_RELENT, ???
>> > hmm, cannot remeber :)
>>
>> How big an entry in DT_REL is
>
> Right, how could I forget :)
>>
>> >>       [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
>> > Oops, you got text relocations. This is generally a bad thing.
>> > TEXTREL is commonly caused by asm code that arent truly pic so it needs
>> > to modify the .text segment to adjust for relocation.
>> > You should get rid of this one. Look for DT_TEXTREL in .o files to find
>> > the culprit.
>> >
>>
>> Alas I cannot - The relocations are a result of loading a register with a
>> return address when calling show_boot_progress in the very early stages of
>> initialisation prior to the stack becoming available. The x86 does not
>> allow direct access to the IP so the only way to find the 'current
>> execution address' is to 'call' to the next instruction and pop the return
>> address off the stack
>
> hmm, same as ppc but that in it self should not cause a TEXREL, should it?
> Ahh, the 'call' is absolute, not relative? I guess there is some way around it
> but it is not important ATM I guess.
>
> Evil idea, skip -fpic et. all and add the full reloc procedure
> to relocate by rewriting directly in TEXT segment. Then you save space
> but you need more relocation code. Something like dl_do_reloc from
> uClibc. Wonder how much extra code that would be? Not too much I think.
>

With the following flags

PLATFORM_RELFLAGS += -fvisibility=hidden
PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions

I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
this might mean I need the symbol table in the binary in order to resolve
them

>>
>> This is not a problem because this is very low-level init that is not
>> called once relocated into RAM - These relocations can be safely ignored
>
>>
>> >>       [0b] 0x6FFFFFFA, 0x00000236 ???, Entries in .rel.dyn
>> >>       [0c] 0x00000000, 0x00000000 DT_NULL, End of Array
>> >>       [0d] 0x00000000, 0x00000000 DT_NULL, End of Array
>> >>       [0e] 0x00000000, 0x00000000 DT_NULL, End of Array
>> >>       [0f] 0x00000000, 0x00000000 DT_NULL, End of Array
>> >>       [10] 0x00000000, 0x00000000 DT_NULL, End of Array
>> >>
>> >> I think some more investigation into the need for .dynsym and .dynamic is
>> >> still required...
>>
>> .dynsym may still be required if only for accessing the __u_boot_cmd
>> structure. However, I may be able to hack that a little and not create a
>> __u_boot_cmd symbol in the linker script (create some other temporary
>> symbol) and populate __u_boot_cmd with a valid value after relocation. It
>> will look a little weird, but may mean not loading this section into RAM
>
> Why do you need to much around with u_boot_cmd at all? Now that relocation
> works you should be able to drop all that code/linker stuff?
>
>>
>> Other than that, .dynsym is now only needed to locate the sections during
>> the relocation phase and can be kept in flash and not copied to RAM
>
> Still occupies space in the *bin image though.
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-13 20:06                                   ` Graeme Russ
       [not found]                                     ` <OF32A18F38.511FF11C-ONC125764E.00750716-C125764E.007534EE@ <4AD511E4.9090204@comcast.net>
@ 2009-10-13 21:20                                     ` Joakim Tjernlund
  2009-10-13 23:48                                       ` J. William Campbell
  1 sibling, 1 reply; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-13 21:20 UTC (permalink / raw)
  To: u-boot

Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:

>
> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
> <joakim.tjernlund@transmode.se> wrote:
> > Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
> >> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
> >> <joakim.tjernlund@transmode.se> wrote:
> >> > Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
> >>
> >> [Massive Snip :)]
> >>
> >> >
> >> >>
> >> >> So, all that is left are .dynsym and .dynamic ...
> >> >>   .dynsym
> >> >>     - Contains 70 entries (16 bytes each, 1120 bytes)
> >> >>     - 44 entries mimic those entries in .got which are not relocated
> >> >>     - 21 entries are the remaining symbols exported from the linker
> >> >>       script
> >> >>     - 4 entries are labels defined in inline asm and used in C
> >> > Try adding proper asm declarations. Look at what gcc
> >> > generates for a function/variable and mimic these.
> >>
> >> Thanks - Now .dynsym contains only exports from the linker script
> > :)
> >>
> >> >
> >> >>     - 1 entry is a NULL entry
> >> >>
> >> >>   .dynamic
> >> >>     - 88 bytes
> >> >>     - Array of Elf32_Dyn
> >> >>     - typedef struct {
> >> >>           Elf32_Sword     d_tag;
> >> >>           union {
> >> >>               Elf32_Word  d_val;
> >> >>               Elf32_Addr  d_ptr;
> >> >>           } d_un;
> >> >>       } Elf32_Dyn;
> >> >>     - 0x11 entries
> >> >>       [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
> >> >>       [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
> >> >>       [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
> >> >>       [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
> >> >>       [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
> >> >>       [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
> >> >>       [06] 0x00000015, 0x00000000 DT_DEBUG, ???
> >> >>       [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
> >> >>       [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
> >> > How big DT_REL is
> >> >>       [09] 0x00000013, 0x00000008 DT_RELENT, ???
> >> > hmm, cannot remeber :)
> >>
> >> How big an entry in DT_REL is
> >
> > Right, how could I forget :)
> >>
> >> >>       [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
> >> > Oops, you got text relocations. This is generally a bad thing.
> >> > TEXTREL is commonly caused by asm code that arent truly pic so it needs
> >> > to modify the .text segment to adjust for relocation.
> >> > You should get rid of this one. Look for DT_TEXTREL in .o files to find
> >> > the culprit.
> >> >
> >>
> >> Alas I cannot - The relocations are a result of loading a register with a
> >> return address when calling show_boot_progress in the very early stages of
> >> initialisation prior to the stack becoming available. The x86 does not
> >> allow direct access to the IP so the only way to find the 'current
> >> execution address' is to 'call' to the next instruction and pop the return
> >> address off the stack
> >
> > hmm, same as ppc but that in it self should not cause a TEXREL, should it?
> > Ahh, the 'call' is absolute, not relative? I guess there is some way around it
> > but it is not important ATM I guess.
> >
> > Evil idea, skip -fpic et. all and add the full reloc procedure
> > to relocate by rewriting directly in TEXT segment. Then you save space
> > but you need more relocation code. Something like dl_do_reloc from
> > uClibc. Wonder how much extra code that would be? Not too much I think.
> >
>
> With the following flags
>
> PLATFORM_RELFLAGS += -fvisibility=hidden
> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
>
> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
> this might mean I need the symbol table in the binary in order to resolve
> them

Possibly, but I think you only need to add an offset to all those
relocs.

  Jokce

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-13 21:20                                     ` Joakim Tjernlund
@ 2009-10-13 23:48                                       ` J. William Campbell
  2009-10-14  7:25                                         ` Joakim Tjernlund
  0 siblings, 1 reply; 47+ messages in thread
From: J. William Campbell @ 2009-10-13 23:48 UTC (permalink / raw)
  To: u-boot

Joakim Tjernlund wrote:
> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
>
>   
>> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>>     
>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
>>>       
>>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>         
>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>>>>>           
>>>> [Massive Snip :)]
>>>>
>>>>         
>>>>>> So, all that is left are .dynsym and .dynamic ...
>>>>>>   .dynsym
>>>>>>     - Contains 70 entries (16 bytes each, 1120 bytes)
>>>>>>     - 44 entries mimic those entries in .got which are not relocated
>>>>>>     - 21 entries are the remaining symbols exported from the linker
>>>>>>       script
>>>>>>     - 4 entries are labels defined in inline asm and used in C
>>>>>>             
>>>>> Try adding proper asm declarations. Look at what gcc
>>>>> generates for a function/variable and mimic these.
>>>>>           
>>>> Thanks - Now .dynsym contains only exports from the linker script
>>>>         
>>> :)
>>>       
>>>>>>     - 1 entry is a NULL entry
>>>>>>
>>>>>>   .dynamic
>>>>>>     - 88 bytes
>>>>>>     - Array of Elf32_Dyn
>>>>>>     - typedef struct {
>>>>>>           Elf32_Sword     d_tag;
>>>>>>           union {
>>>>>>               Elf32_Word  d_val;
>>>>>>               Elf32_Addr  d_ptr;
>>>>>>           } d_un;
>>>>>>       } Elf32_Dyn;
>>>>>>     - 0x11 entries
>>>>>>       [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
>>>>>>       [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
>>>>>>       [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
>>>>>>       [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
>>>>>>       [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
>>>>>>       [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
>>>>>>       [06] 0x00000015, 0x00000000 DT_DEBUG, ???
>>>>>>       [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
>>>>>>       [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
>>>>>>             
>>>>> How big DT_REL is
>>>>>           
>>>>>>       [09] 0x00000013, 0x00000008 DT_RELENT, ???
>>>>>>             
>>>>> hmm, cannot remeber :)
>>>>>           
>>>> How big an entry in DT_REL is
>>>>         
>>> Right, how could I forget :)
>>>       
>>>>>>       [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
>>>>>>             
>>>>> Oops, you got text relocations. This is generally a bad thing.
>>>>> TEXTREL is commonly caused by asm code that arent truly pic so it needs
>>>>> to modify the .text segment to adjust for relocation.
>>>>> You should get rid of this one. Look for DT_TEXTREL in .o files to find
>>>>> the culprit.
>>>>>
>>>>>           
>>>> Alas I cannot - The relocations are a result of loading a register with a
>>>> return address when calling show_boot_progress in the very early stages of
>>>> initialisation prior to the stack becoming available. The x86 does not
>>>> allow direct access to the IP so the only way to find the 'current
>>>> execution address' is to 'call' to the next instruction and pop the return
>>>> address off the stack
>>>>         
>>> hmm, same as ppc but that in it self should not cause a TEXREL, should it?
>>> Ahh, the 'call' is absolute, not relative? I guess there is some way around it
>>> but it is not important ATM I guess.
>>>
>>> Evil idea, skip -fpic et. all and add the full reloc procedure
>>> to relocate by rewriting directly in TEXT segment. Then you save space
>>> but you need more relocation code. Something like dl_do_reloc from
>>> uClibc. Wonder how much extra code that would be? Not too much I think.
>>>
>>>       
>> With the following flags
>>
>> PLATFORM_RELFLAGS += -fvisibility=hidden
>> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
>>
>> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
>> this might mean I need the symbol table in the binary in order to resolve
>> them
>>     
>
> Possibly, but I think you only need to add an offset to all those
> relocs.
>   
Almost right. The relocations specify a symbol value that needs to be 
added to the data in memory to relocate the reference. The symbol values 
involved should be the start of the text section for program references, 
the start of the uninitialized data section for bss references, and the 
start of the data section for initialized data and constants. So there 
are about four symbols whose value you need to keep. Take a look at 
http://refspecs.freestandards.org/elf/elf.pdf (which you have probably 
already looked at) and it tells you what to do with R_386_PC32 ad 
R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded 
will remove all the symbols you don't actually need, but I don't know 
that for sure. Note also that you can change the section flags of a 
section marked noload  to load.

Best Regards,
Bill Campbell
>   Jokce
>
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
>
>
>   

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-13 23:48                                       ` J. William Campbell
@ 2009-10-14  7:25                                         ` Joakim Tjernlund
  2009-10-14 11:48                                           ` Graeme Russ
  2009-10-14 15:35                                           ` J. William Campbell
  0 siblings, 2 replies; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-14  7:25 UTC (permalink / raw)
  To: u-boot

"J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 01:48:52:
>
> Joakim Tjernlund wrote:
> > Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
> >
> >
> >> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
> >> <joakim.tjernlund@transmode.se> wrote:
> >>
> >>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
> >>>
> >>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
> >>>> <joakim.tjernlund@transmode.se> wrote:
> >>>>
> >>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
> >>>>>
> >>>> [Massive Snip :)]
> >>>>
> >>>>
> >>>>>> So, all that is left are .dynsym and .dynamic ...
> >>>>>>   .dynsym
> >>>>>>     - Contains 70 entries (16 bytes each, 1120 bytes)
> >>>>>>     - 44 entries mimic those entries in .got which are not relocated
> >>>>>>     - 21 entries are the remaining symbols exported from the linker
> >>>>>>       script
> >>>>>>     - 4 entries are labels defined in inline asm and used in C
> >>>>>>
> >>>>> Try adding proper asm declarations. Look at what gcc
> >>>>> generates for a function/variable and mimic these.
> >>>>>
> >>>> Thanks - Now .dynsym contains only exports from the linker script
> >>>>
> >>> :)
> >>>
> >>>>>>     - 1 entry is a NULL entry
> >>>>>>
> >>>>>>   .dynamic
> >>>>>>     - 88 bytes
> >>>>>>     - Array of Elf32_Dyn
> >>>>>>     - typedef struct {
> >>>>>>           Elf32_Sword     d_tag;
> >>>>>>           union {
> >>>>>>               Elf32_Word  d_val;
> >>>>>>               Elf32_Addr  d_ptr;
> >>>>>>           } d_un;
> >>>>>>       } Elf32_Dyn;
> >>>>>>     - 0x11 entries
> >>>>>>       [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
> >>>>>>       [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
> >>>>>>       [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
> >>>>>>       [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
> >>>>>>       [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
> >>>>>>       [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
> >>>>>>       [06] 0x00000015, 0x00000000 DT_DEBUG, ???
> >>>>>>       [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
> >>>>>>       [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
> >>>>>>
> >>>>> How big DT_REL is
> >>>>>
> >>>>>>       [09] 0x00000013, 0x00000008 DT_RELENT, ???
> >>>>>>
> >>>>> hmm, cannot remeber :)
> >>>>>
> >>>> How big an entry in DT_REL is
> >>>>
> >>> Right, how could I forget :)
> >>>
> >>>>>>       [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
> >>>>>>
> >>>>> Oops, you got text relocations. This is generally a bad thing.
> >>>>> TEXTREL is commonly caused by asm code that arent truly pic so it needs
> >>>>> to modify the .text segment to adjust for relocation.
> >>>>> You should get rid of this one. Look for DT_TEXTREL in .o files to find
> >>>>> the culprit.
> >>>>>
> >>>>>
> >>>> Alas I cannot - The relocations are a result of loading a register with a
> >>>> return address when calling show_boot_progress in the very early stages of
> >>>> initialisation prior to the stack becoming available. The x86 does not
> >>>> allow direct access to the IP so the only way to find the 'current
> >>>> execution address' is to 'call' to the next instruction and pop the return
> >>>> address off the stack
> >>>>
> >>> hmm, same as ppc but that in it self should not cause a TEXREL, should it?
> >>> Ahh, the 'call' is absolute, not relative? I guess there is some way around it
> >>> but it is not important ATM I guess.
> >>>
> >>> Evil idea, skip -fpic et. all and add the full reloc procedure
> >>> to relocate by rewriting directly in TEXT segment. Then you save space
> >>> but you need more relocation code. Something like dl_do_reloc from
> >>> uClibc. Wonder how much extra code that would be? Not too much I think.
> >>>
> >>>
> >> With the following flags
> >>
> >> PLATFORM_RELFLAGS += -fvisibility=hidden
> >> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
> >> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
> >>
> >> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
> >> this might mean I need the symbol table in the binary in order to resolve
> >> them
> >>

BTW, how many relocs do you get compared with -fPIC? I suspect you more
now but hopefully not that many more.

> >
> > Possibly, but I think you only need to add an offset to all those
> > relocs.
> >
> Almost right. The relocations specify a symbol value that needs to be
> added to the data in memory to relocate the reference. The symbol values
> involved should be the start of the text section for program references,
> the start of the uninitialized data section for bss references, and the
> start of the data section for initialized data and constants. So there
> are about four symbols whose value you need to keep. Take a look at
> http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
> already looked at) and it tells you what to do with R_386_PC32 ad
> R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
> will remove all the symbols you don't actually need, but I don't know
> that for sure. Note also that you can change the section flags of a
> section marked noload  to load.

Still think you can get away with just ADDING an offset. The image is linked to a
specific address and then you move the whole image to a new address. Therefore
you should be able to read the current address, add offset, write back the new address.

Normally one do what you describe but here we know that the whole img has moved so
we don't have to do calculate the new address from scratch.

       Jocke

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-14  7:25                                         ` Joakim Tjernlund
@ 2009-10-14 11:48                                           ` Graeme Russ
  2009-10-14 12:38                                             ` Joakim Tjernlund
  2009-10-14 15:35                                           ` J. William Campbell
  1 sibling, 1 reply; 47+ messages in thread
From: Graeme Russ @ 2009-10-14 11:48 UTC (permalink / raw)
  To: u-boot

On Wed, Oct 14, 2009 at 6:25 PM, Joakim Tjernlund
<joakim.tjernlund@transmode.se> wrote:
> "J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 01:48:52:
>>
>> Joakim Tjernlund wrote:
>> > Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
>> >
>> >
>> >> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
>> >> <joakim.tjernlund@transmode.se> wrote:
>> >>
>> >>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
>> >>>
>> >>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
>> >>>> <joakim.tjernlund@transmode.se> wrote:
>> >>>>
>> >>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>> >>>>>
>> >>>> [Massive Snip :)]
>> >>>>
>> >>>>
>> >>>>>> So, all that is left are .dynsym and .dynamic ...
>> >>>>>>   .dynsym
>> >>>>>>     - Contains 70 entries (16 bytes each, 1120 bytes)
>> >>>>>>     - 44 entries mimic those entries in .got which are not relocated
>> >>>>>>     - 21 entries are the remaining symbols exported from the linker
>> >>>>>>       script
>> >>>>>>     - 4 entries are labels defined in inline asm and used in C
>> >>>>>>
>> >>>>> Try adding proper asm declarations. Look at what gcc
>> >>>>> generates for a function/variable and mimic these.
>> >>>>>
>> >>>> Thanks - Now .dynsym contains only exports from the linker script
>> >>>>
>> >>> :)
>> >>>
>> >>>>>>     - 1 entry is a NULL entry
>> >>>>>>
>> >>>>>>   .dynamic
>> >>>>>>     - 88 bytes
>> >>>>>>     - Array of Elf32_Dyn
>> >>>>>>     - typedef struct {
>> >>>>>>           Elf32_Sword     d_tag;
>> >>>>>>           union {
>> >>>>>>               Elf32_Word  d_val;
>> >>>>>>               Elf32_Addr  d_ptr;
>> >>>>>>           } d_un;
>> >>>>>>       } Elf32_Dyn;
>> >>>>>>     - 0x11 entries
>> >>>>>>       [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
>> >>>>>>       [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
>> >>>>>>       [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
>> >>>>>>       [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
>> >>>>>>       [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
>> >>>>>>       [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
>> >>>>>>       [06] 0x00000015, 0x00000000 DT_DEBUG, ???
>> >>>>>>       [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
>> >>>>>>       [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
>> >>>>>>
>> >>>>> How big DT_REL is
>> >>>>>
>> >>>>>>       [09] 0x00000013, 0x00000008 DT_RELENT, ???
>> >>>>>>
>> >>>>> hmm, cannot remeber :)
>> >>>>>
>> >>>> How big an entry in DT_REL is
>> >>>>
>> >>> Right, how could I forget :)
>> >>>
>> >>>>>>       [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
>> >>>>>>
>> >>>>> Oops, you got text relocations. This is generally a bad thing.
>> >>>>> TEXTREL is commonly caused by asm code that arent truly pic so it needs
>> >>>>> to modify the .text segment to adjust for relocation.
>> >>>>> You should get rid of this one. Look for DT_TEXTREL in .o files to find
>> >>>>> the culprit.
>> >>>>>
>> >>>>>
>> >>>> Alas I cannot - The relocations are a result of loading a register with a
>> >>>> return address when calling show_boot_progress in the very early stages of
>> >>>> initialisation prior to the stack becoming available. The x86 does not
>> >>>> allow direct access to the IP so the only way to find the 'current
>> >>>> execution address' is to 'call' to the next instruction and pop the return
>> >>>> address off the stack
>> >>>>
>> >>> hmm, same as ppc but that in it self should not cause a TEXREL, should it?
>> >>> Ahh, the 'call' is absolute, not relative? I guess there is some way around it
>> >>> but it is not important ATM I guess.
>> >>>
>> >>> Evil idea, skip -fpic et. all and add the full reloc procedure
>> >>> to relocate by rewriting directly in TEXT segment. Then you save space
>> >>> but you need more relocation code. Something like dl_do_reloc from
>> >>> uClibc. Wonder how much extra code that would be? Not too much I think.
>> >>>
>> >>>
>> >> With the following flags
>> >>
>> >> PLATFORM_RELFLAGS += -fvisibility=hidden
>> >> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>> >> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
>> >>
>> >> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
>> >> this might mean I need the symbol table in the binary in order to resolve
>> >> them
>> >>
>
> BTW, how many relocs do you get compared with -fPIC? I suspect you more
> now but hopefully not that many more.
>
>> >
>> > Possibly, but I think you only need to add an offset to all those
>> > relocs.
>> >
>> Almost right. The relocations specify a symbol value that needs to be
>> added to the data in memory to relocate the reference. The symbol values
>> involved should be the start of the text section for program references,
>> the start of the uninitialized data section for bss references, and the
>> start of the data section for initialized data and constants. So there
>> are about four symbols whose value you need to keep. Take a look at
>> http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
>> already looked at) and it tells you what to do with R_386_PC32 ad
>> R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
>> will remove all the symbols you don't actually need, but I don't know
>> that for sure. Note also that you can change the section flags of a
>> section marked noload  to load.
>
> Still think you can get away with just ADDING an offset. The image is linked to a
> specific address and then you move the whole image to a new address. Therefore
> you should be able to read the current address, add offset, write back the new address.
>

OK, I don't really get this at all....

This code:

    printf ("\n\n%s\n\n", version_string);

gets compiled into:

    380403e7:   68 a4 18 05 38   push   $0x380518a4
    380403ec:   68 de 2c 05 38   push   $0x38052cde
    380403f1:   e8 4f 84 00 00   call   38048845 <printf>

With relocation entries in .rel.text of:

 Offset     Info    Type            Sym.Value  Sym. Name
   380403e8  00016201 R_386_32          380519f0   version_string
   380403ed  00000201 R_386_32          380519f0   .rodata
   380403f2  00016b02 R_386_PC32        38048991   printf

Now I get the first two (R_386_32) entries - Relocation involves a simple
addition of an offset to the values at addresses 0x380403e8 and 0x380403ed
(of course, these addresses will be offset)

However, the R_386_PC32 is an enigma - The call is already relative -
there is no need to relocate it at all (call is a position independent
opcode because it is a relative jump!)

Will all R_386_PC32 be like this? Can I simply ignore them all? If so, why
do they even need to be generated?

Hmmm

Graeme

> Normally one do what you describe but here we know that the whole img has moved so
> we don't have to do calculate the new address from scratch.
>
>       Jocke
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-14 11:48                                           ` Graeme Russ
@ 2009-10-14 12:38                                             ` Joakim Tjernlund
  2009-10-14 16:45                                               ` J. William Campbell
  0 siblings, 1 reply; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-14 12:38 UTC (permalink / raw)
  To: u-boot

Graeme Russ <graeme.russ@gmail.com> wrote on 14/10/2009 13:48:27:
>
> On Wed, Oct 14, 2009 at 6:25 PM, Joakim Tjernlund
> <joakim.tjernlund@transmode.se> wrote:
> > "J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 01:48:52:
> >>
> >> Joakim Tjernlund wrote:
> >> > Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
> >> >
> >> >
> >> >> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
> >> >> <joakim.tjernlund@transmode.se> wrote:
> >> >>
> >> >>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
> >> >>>
> >> >>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
> >> >>>> <joakim.tjernlund@transmode.se> wrote:
> >> >>>>
> >> >>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
> >> >>>>>
> >> >>>> [Massive Snip :)]
> >> >>>>
> >> >>>>
> >> >>>>>> So, all that is left are .dynsym and .dynamic ...
> >> >>>>>>   .dynsym
> >> >>>>>>     - Contains 70 entries (16 bytes each, 1120 bytes)
> >> >>>>>>     - 44 entries mimic those entries in .got which are not relocated
> >> >>>>>>     - 21 entries are the remaining symbols exported from the linker
> >> >>>>>>       script
> >> >>>>>>     - 4 entries are labels defined in inline asm and used in C
> >> >>>>>>
> >> >>>>> Try adding proper asm declarations. Look at what gcc
> >> >>>>> generates for a function/variable and mimic these.
> >> >>>>>
> >> >>>> Thanks - Now .dynsym contains only exports from the linker script
> >> >>>>
> >> >>> :)
> >> >>>
> >> >>>>>>     - 1 entry is a NULL entry
> >> >>>>>>
> >> >>>>>>   .dynamic
> >> >>>>>>     - 88 bytes
> >> >>>>>>     - Array of Elf32_Dyn
> >> >>>>>>     - typedef struct {
> >> >>>>>>           Elf32_Sword     d_tag;
> >> >>>>>>           union {
> >> >>>>>>               Elf32_Word  d_val;
> >> >>>>>>               Elf32_Addr  d_ptr;
> >> >>>>>>           } d_un;
> >> >>>>>>       } Elf32_Dyn;
> >> >>>>>>     - 0x11 entries
> >> >>>>>>       [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
> >> >>>>>>       [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
> >> >>>>>>       [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
> >> >>>>>>       [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
> >> >>>>>>       [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
> >> >>>>>>       [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
> >> >>>>>>       [06] 0x00000015, 0x00000000 DT_DEBUG, ???
> >> >>>>>>       [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
> >> >>>>>>       [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
> >> >>>>>>
> >> >>>>> How big DT_REL is
> >> >>>>>
> >> >>>>>>       [09] 0x00000013, 0x00000008 DT_RELENT, ???
> >> >>>>>>
> >> >>>>> hmm, cannot remeber :)
> >> >>>>>
> >> >>>> How big an entry in DT_REL is
> >> >>>>
> >> >>> Right, how could I forget :)
> >> >>>
> >> >>>>>>       [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
> >> >>>>>>
> >> >>>>> Oops, you got text relocations. This is generally a bad thing.
> >> >>>>> TEXTREL is commonly caused by asm code that arent truly pic so it needs
> >> >>>>> to modify the .text segment to adjust for relocation.
> >> >>>>> You should get rid of this one. Look for DT_TEXTREL in .o files to find
> >> >>>>> the culprit.
> >> >>>>>
> >> >>>>>
> >> >>>> Alas I cannot - The relocations are a result of loading a register with a
> >> >>>> return address when calling show_boot_progress in the very early stages of
> >> >>>> initialisation prior to the stack becoming available. The x86 does not
> >> >>>> allow direct access to the IP so the only way to find the 'current
> >> >>>> execution address' is to 'call' to the next instruction and pop the return
> >> >>>> address off the stack
> >> >>>>
> >> >>> hmm, same as ppc but that in it self should not cause a TEXREL, should it?
> >> >>> Ahh, the 'call' is absolute, not relative? I guess there is some way around it
> >> >>> but it is not important ATM I guess.
> >> >>>
> >> >>> Evil idea, skip -fpic et. all and add the full reloc procedure
> >> >>> to relocate by rewriting directly in TEXT segment. Then you save space
> >> >>> but you need more relocation code. Something like dl_do_reloc from
> >> >>> uClibc. Wonder how much extra code that would be? Not too much I think.
> >> >>>
> >> >>>
> >> >> With the following flags
> >> >>
> >> >> PLATFORM_RELFLAGS += -fvisibility=hidden
> >> >> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
> >> >> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
> >> >>
> >> >> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
> >> >> this might mean I need the symbol table in the binary in order to resolve
> >> >> them
> >> >>
> >
> > BTW, how many relocs do you get compared with -fPIC? I suspect you more
> > now but hopefully not that many more.
> >
> >> >
> >> > Possibly, but I think you only need to add an offset to all those
> >> > relocs.
> >> >
> >> Almost right. The relocations specify a symbol value that needs to be
> >> added to the data in memory to relocate the reference. The symbol values
> >> involved should be the start of the text section for program references,
> >> the start of the uninitialized data section for bss references, and the
> >> start of the data section for initialized data and constants. So there
> >> are about four symbols whose value you need to keep. Take a look at
> >> http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
> >> already looked at) and it tells you what to do with R_386_PC32 ad
> >> R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
> >> will remove all the symbols you don't actually need, but I don't know
> >> that for sure. Note also that you can change the section flags of a
> >> section marked noload  to load.
> >
> > Still think you can get away with just ADDING an offset. The image is linked to a
> > specific address and then you move the whole image to a new address. Therefore
> > you should be able to read the current address, add offset, write back the
> new address.
> >
>
> OK, I don't really get this at all....
>
> This code:
>
>     printf ("\n\n%s\n\n", version_string);
>
> gets compiled into:
>
>     380403e7:   68 a4 18 05 38   push   $0x380518a4
>     380403ec:   68 de 2c 05 38   push   $0x38052cde
>     380403f1:   e8 4f 84 00 00   call   38048845 <printf>
>
> With relocation entries in .rel.text of:
>
>  Offset     Info    Type            Sym.Value  Sym. Name
>    380403e8  00016201 R_386_32          380519f0   version_string
>    380403ed  00000201 R_386_32          380519f0   .rodata
>    380403f2  00016b02 R_386_PC32        38048991   printf
>
> Now I get the first two (R_386_32) entries - Relocation involves a simple
> addition of an offset to the values at addresses 0x380403e8 and 0x380403ed
> (of course, these addresses will be offset)
>
> However, the R_386_PC32 is an enigma - The call is already relative -
> there is no need to relocate it at all (call is a position independent
> opcode because it is a relative jump!)

Yes, but printf is defined in glibc s? the app needs to relocate the call
to glibc. U-boot has all it needs so there you should not have PC32 I think.
Try defining a local static function. For non static functions
you may need to define visibility=hidden and/or -Bsymbolic too.
You also need to look at the img after final linking.

>
> Will all R_386_PC32 be like this? Can I simply ignore them all? If so, why
> do they even need to be generated?

Hopefully you won't have any. Not sure about weak functions though. These might
need PC32 relocs in some cases.

Also, if you look at _dl_do_reloc() in uClibc/ldso/ldso/i386/elfinterp.c I think
you can replace symbol_addr with relocation offset.

    Jocke

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-14  7:25                                         ` Joakim Tjernlund
  2009-10-14 11:48                                           ` Graeme Russ
@ 2009-10-14 15:35                                           ` J. William Campbell
  2009-10-14 16:05                                             ` Joakim Tjernlund
  1 sibling, 1 reply; 47+ messages in thread
From: J. William Campbell @ 2009-10-14 15:35 UTC (permalink / raw)
  To: u-boot

Joakim Tjernlund wrote:
> "J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 01:48:52:
>   
>> Joakim Tjernlund wrote:
>>     
>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
>>>
>>>
>>>       
>>>> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>
>>>>         
>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
>>>>>
>>>>>           
>>>>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>
>>>>>>             
>>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>>>>>>>
>>>>>>>               
>>>>>> [Massive Snip :)]
>>>>>>
>>>>>>
>>>>>>             
>>>>>>>> So, all that is left are .dynsym and .dynamic ...
>>>>>>>>   .dynsym
>>>>>>>>     - Contains 70 entries (16 bytes each, 1120 bytes)
>>>>>>>>     - 44 entries mimic those entries in .got which are not relocated
>>>>>>>>     - 21 entries are the remaining symbols exported from the linker
>>>>>>>>       script
>>>>>>>>     - 4 entries are labels defined in inline asm and used in C
>>>>>>>>
>>>>>>>>                 
>>>>>>> Try adding proper asm declarations. Look at what gcc
>>>>>>> generates for a function/variable and mimic these.
>>>>>>>
>>>>>>>               
>>>>>> Thanks - Now .dynsym contains only exports from the linker script
>>>>>>
>>>>>>             
>>>>> :)
>>>>>
>>>>>           
>>>>>>>>     - 1 entry is a NULL entry
>>>>>>>>
>>>>>>>>   .dynamic
>>>>>>>>     - 88 bytes
>>>>>>>>     - Array of Elf32_Dyn
>>>>>>>>     - typedef struct {
>>>>>>>>           Elf32_Sword     d_tag;
>>>>>>>>           union {
>>>>>>>>               Elf32_Word  d_val;
>>>>>>>>               Elf32_Addr  d_ptr;
>>>>>>>>           } d_un;
>>>>>>>>       } Elf32_Dyn;
>>>>>>>>     - 0x11 entries
>>>>>>>>       [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
>>>>>>>>       [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
>>>>>>>>       [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
>>>>>>>>       [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
>>>>>>>>       [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
>>>>>>>>       [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
>>>>>>>>       [06] 0x00000015, 0x00000000 DT_DEBUG, ???
>>>>>>>>       [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
>>>>>>>>       [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
>>>>>>>>
>>>>>>>>                 
>>>>>>> How big DT_REL is
>>>>>>>
>>>>>>>               
>>>>>>>>       [09] 0x00000013, 0x00000008 DT_RELENT, ???
>>>>>>>>
>>>>>>>>                 
>>>>>>> hmm, cannot remeber :)
>>>>>>>
>>>>>>>               
>>>>>> How big an entry in DT_REL is
>>>>>>
>>>>>>             
>>>>> Right, how could I forget :)
>>>>>
>>>>>           
>>>>>>>>       [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
>>>>>>>>
>>>>>>>>                 
>>>>>>> Oops, you got text relocations. This is generally a bad thing.
>>>>>>> TEXTREL is commonly caused by asm code that arent truly pic so it needs
>>>>>>> to modify the .text segment to adjust for relocation.
>>>>>>> You should get rid of this one. Look for DT_TEXTREL in .o files to find
>>>>>>> the culprit.
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> Alas I cannot - The relocations are a result of loading a register with a
>>>>>> return address when calling show_boot_progress in the very early stages of
>>>>>> initialisation prior to the stack becoming available. The x86 does not
>>>>>> allow direct access to the IP so the only way to find the 'current
>>>>>> execution address' is to 'call' to the next instruction and pop the return
>>>>>> address off the stack
>>>>>>
>>>>>>             
>>>>> hmm, same as ppc but that in it self should not cause a TEXREL, should it?
>>>>> Ahh, the 'call' is absolute, not relative? I guess there is some way around it
>>>>> but it is not important ATM I guess.
>>>>>
>>>>> Evil idea, skip -fpic et. all and add the full reloc procedure
>>>>> to relocate by rewriting directly in TEXT segment. Then you save space
>>>>> but you need more relocation code. Something like dl_do_reloc from
>>>>> uClibc. Wonder how much extra code that would be? Not too much I think.
>>>>>
>>>>>
>>>>>           
>>>> With the following flags
>>>>
>>>> PLATFORM_RELFLAGS += -fvisibility=hidden
>>>> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>>>> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
>>>>
>>>> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
>>>> this might mean I need the symbol table in the binary in order to resolve
>>>> them
>>>>
>>>>         
>
> BTW, how many relocs do you get compared with -fPIC? I suspect you more
> now but hopefully not that many more.
>
>   
>>> Possibly, but I think you only need to add an offset to all those
>>> relocs.
>>>
>>>       
>> Almost right. The relocations specify a symbol value that needs to be
>> added to the data in memory to relocate the reference. The symbol values
>> involved should be the start of the text section for program references,
>> the start of the uninitialized data section for bss references, and the
>> start of the data section for initialized data and constants. So there
>> are about four symbols whose value you need to keep. Take a look at
>> http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
>> already looked at) and it tells you what to do with R_386_PC32 ad
>> R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
>> will remove all the symbols you don't actually need, but I don't know
>> that for sure. Note also that you can change the section flags of a
>> section marked noload  to load.
>>     
>
> Still think you can get away with just ADDING an offset. The image is linked to a
> specific address and then you move the whole image to a new address. Therefore
> you should be able to read the current address, add offset, write back the new address.
>
> Normally one do what you describe but here we know that the whole img has moved so
> we don't have to do calculate the new address from scratch.
>   
If the addresses of the bss, text, and data segments change by the same 
value, I think you are correct. However, if the text and data/bss 
segments are moved by different offsets, naturally the relocations would 
be different. One reason to retain this capability would be to allow the 
u-boot copy to execute in place in NOR flash while re-locating the 
read-write storage once memory has been sized. Having different 
relocation factors is not much worse than just one, and it may be just 
as easy to get working initially as a single relocation constant.

FWIW, the "ultimate" solution to minimum relocation size is a 
post-processing step that creates "several" arrays of relocation offsets 
as two byte quantities. This reduces the cost of each relocation entry 
to just a bit more than two bytes (there is a small overhead for array 
size, MSB values and relocation offset selection.) Naturally, this is 
much less than the ELF version of the same relocations, because we do 
not need to retain as much information and ELF doesn't worry about size 
that much.. This may pacify users for which the flash size of the image 
is critical, at the expense of an extra link step. Naturally, getting 
things to work with "standard ELF" is the most important step, and 
probably enough for most people.

I also am interested in the number of additional relocations generated 
without -fpic. I suspect on the 386 it can be substantial. However, for 
every new reloc generated, a .got reference load will probably be 
eliminated. This should result in a shorter text segment to balance the 
increased relocation segment. Adding the -fno-jump-tables gcc option may 
also help a bit.

Bill Campbell
>        Jocke
>
>
>
>   

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-14 15:35                                           ` J. William Campbell
@ 2009-10-14 16:05                                             ` Joakim Tjernlund
  2009-10-14 16:49                                               ` J. William Campbell
  0 siblings, 1 reply; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-14 16:05 UTC (permalink / raw)
  To: u-boot

"J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 17:35:44:
>
> Joakim Tjernlund wrote:
> > "J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 01:48:52:
> >
> >> Joakim Tjernlund wrote:
> >>
> >>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
> >>>
> >>>
> >>>
> >>>> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
> >>>> <joakim.tjernlund@transmode.se> wrote:
> >>>>
> >>>>
> >>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
> >>>>>
> >>>>>
> >>>>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
> >>>>>> <joakim.tjernlund@transmode.se> wrote:
> >>>>>>
> >>>>>>
> >>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
> >>>>>>>
> >>>>>>>
> >>>>>> [Massive Snip :)]

[Yet another SNIP :)]

> >>>>> Evil idea, skip -fpic et. all and add the full reloc procedure
> >>>>> to relocate by rewriting directly in TEXT segment. Then you save space
> >>>>> but you need more relocation code. Something like dl_do_reloc from
> >>>>> uClibc. Wonder how much extra code that would be? Not too much I think.
> >>>>>
> >>>>>
> >>>>>
> >>>> With the following flags
> >>>>
> >>>> PLATFORM_RELFLAGS += -fvisibility=hidden
> >>>> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
> >>>> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
> >>>>
> >>>> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
> >>>> this might mean I need the symbol table in the binary in order to resolve
> >>>> them
> >>>>
> >>>>
> >
> > BTW, how many relocs do you get compared with -fPIC? I suspect you more
> > now but hopefully not that many more.
> >
> >
> >>> Possibly, but I think you only need to add an offset to all those
> >>> relocs.
> >>>
> >>>
> >> Almost right. The relocations specify a symbol value that needs to be
> >> added to the data in memory to relocate the reference. The symbol values
> >> involved should be the start of the text section for program references,
> >> the start of the uninitialized data section for bss references, and the
> >> start of the data section for initialized data and constants. So there
> >> are about four symbols whose value you need to keep. Take a look at
> >> http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
> >> already looked at) and it tells you what to do with R_386_PC32 ad
> >> R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
> >> will remove all the symbols you don't actually need, but I don't know
> >> that for sure. Note also that you can change the section flags of a
> >> section marked noload  to load.
> >>
> >
> > Still think you can get away with just ADDING an offset. The image is linked to a
> > specific address and then you move the whole image to a new address. Therefore
> > you should be able to read the current address, add offset, write back the
> new address.
> >
> > Normally one do what you describe but here we know that the whole img has moved so
> > we don't have to do calculate the new address from scratch.
> >
> If the addresses of the bss, text, and data segments change by the same
> value, I think you are correct. However, if the text and data/bss
> segments are moved by different offsets, naturally the relocations would
> be different. One reason to retain this capability would be to allow the
> u-boot copy to execute in place in NOR flash while re-locating the
> read-write storage once memory has been sized. Having different
> relocation factors is not much worse than just one, and it may be just
> as easy to get working initially as a single relocation constant.

How do figure that? You need to rewrite the insn to access the moved
data/bss and they are in flash, did I miss something?

>
> FWIW, the "ultimate" solution to minimum relocation size is a
> post-processing step that creates "several" arrays of relocation offsets
> as two byte quantities. This reduces the cost of each relocation entry
> to just a bit more than two bytes (there is a small overhead for array
> size, MSB values and relocation offset selection.) Naturally, this is
> much less than the ELF version of the same relocations, because we do
> not need to retain as much information and ELF doesn't worry about size
> that much.. This may pacify users for which the flash size of the image
> is critical, at the expense of an extra link step. Naturally, getting
> things to work with "standard ELF" is the most important step, and
> probably enough for most people.

That would save 2+4 bytes/reloc on REL arches and
2+4+4 on RELA(ppc) (provided one can ignore r_addend)

But yes, this is probably too "fancy" for the moment.

  Jocke

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-14 12:38                                             ` Joakim Tjernlund
@ 2009-10-14 16:45                                               ` J. William Campbell
  2009-10-17  5:17                                                 ` Graeme Russ
  0 siblings, 1 reply; 47+ messages in thread
From: J. William Campbell @ 2009-10-14 16:45 UTC (permalink / raw)
  To: u-boot

Joakim Tjernlund wrote:
> Graeme Russ <graeme.russ@gmail.com> wrote on 14/10/2009 13:48:27:
>   
>> On Wed, Oct 14, 2009 at 6:25 PM, Joakim Tjernlund
>> <joakim.tjernlund@transmode.se> wrote:
>>     
>>> "J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 01:48:52:
>>>       
>>>> Joakim Tjernlund wrote:
>>>>         
>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
>>>>>
>>>>>
>>>>>           
>>>>>> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>
>>>>>>             
>>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
>>>>>>>
>>>>>>>               
>>>>>>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
>>>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> [Massive Snip :)]
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>>> So, all that is left are .dynsym and .dynamic ...
>>>>>>>>>>   .dynsym
>>>>>>>>>>     - Contains 70 entries (16 bytes each, 1120 bytes)
>>>>>>>>>>     - 44 entries mimic those entries in .got which are not relocated
>>>>>>>>>>     - 21 entries are the remaining symbols exported from the linker
>>>>>>>>>>       script
>>>>>>>>>>     - 4 entries are labels defined in inline asm and used in C
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>> Try adding proper asm declarations. Look at what gcc
>>>>>>>>> generates for a function/variable and mimic these.
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> Thanks - Now .dynsym contains only exports from the linker script
>>>>>>>>
>>>>>>>>                 
>>>>>>> :)
>>>>>>>
>>>>>>>               
>>>>>>>>>>     - 1 entry is a NULL entry
>>>>>>>>>>
>>>>>>>>>>   .dynamic
>>>>>>>>>>     - 88 bytes
>>>>>>>>>>     - Array of Elf32_Dyn
>>>>>>>>>>     - typedef struct {
>>>>>>>>>>           Elf32_Sword     d_tag;
>>>>>>>>>>           union {
>>>>>>>>>>               Elf32_Word  d_val;
>>>>>>>>>>               Elf32_Addr  d_ptr;
>>>>>>>>>>           } d_un;
>>>>>>>>>>       } Elf32_Dyn;
>>>>>>>>>>     - 0x11 entries
>>>>>>>>>>       [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
>>>>>>>>>>       [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
>>>>>>>>>>       [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
>>>>>>>>>>       [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
>>>>>>>>>>       [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
>>>>>>>>>>       [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
>>>>>>>>>>       [06] 0x00000015, 0x00000000 DT_DEBUG, ???
>>>>>>>>>>       [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
>>>>>>>>>>       [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>> How big DT_REL is
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>>       [09] 0x00000013, 0x00000008 DT_RELENT, ???
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>> hmm, cannot remeber :)
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> How big an entry in DT_REL is
>>>>>>>>
>>>>>>>>                 
>>>>>>> Right, how could I forget :)
>>>>>>>
>>>>>>>               
>>>>>>>>>>       [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>> Oops, you got text relocations. This is generally a bad thing.
>>>>>>>>> TEXTREL is commonly caused by asm code that arent truly pic so it needs
>>>>>>>>> to modify the .text segment to adjust for relocation.
>>>>>>>>> You should get rid of this one. Look for DT_TEXTREL in .o files to find
>>>>>>>>> the culprit.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> Alas I cannot - The relocations are a result of loading a register with a
>>>>>>>> return address when calling show_boot_progress in the very early stages of
>>>>>>>> initialisation prior to the stack becoming available. The x86 does not
>>>>>>>> allow direct access to the IP so the only way to find the 'current
>>>>>>>> execution address' is to 'call' to the next instruction and pop the return
>>>>>>>> address off the stack
>>>>>>>>
>>>>>>>>                 
>>>>>>> hmm, same as ppc but that in it self should not cause a TEXREL, should it?
>>>>>>> Ahh, the 'call' is absolute, not relative? I guess there is some way around it
>>>>>>> but it is not important ATM I guess.
>>>>>>>
>>>>>>> Evil idea, skip -fpic et. all and add the full reloc procedure
>>>>>>> to relocate by rewriting directly in TEXT segment. Then you save space
>>>>>>> but you need more relocation code. Something like dl_do_reloc from
>>>>>>> uClibc. Wonder how much extra code that would be? Not too much I think.
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> With the following flags
>>>>>>
>>>>>> PLATFORM_RELFLAGS += -fvisibility=hidden
>>>>>> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>>>>>> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
>>>>>>
>>>>>> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
>>>>>> this might mean I need the symbol table in the binary in order to resolve
>>>>>> them
>>>>>>
>>>>>>             
>>> BTW, how many relocs do you get compared with -fPIC? I suspect you more
>>> now but hopefully not that many more.
>>>
>>>       
>>>>> Possibly, but I think you only need to add an offset to all those
>>>>> relocs.
>>>>>
>>>>>           
>>>> Almost right. The relocations specify a symbol value that needs to be
>>>> added to the data in memory to relocate the reference. The symbol values
>>>> involved should be the start of the text section for program references,
>>>> the start of the uninitialized data section for bss references, and the
>>>> start of the data section for initialized data and constants. So there
>>>> are about four symbols whose value you need to keep. Take a look at
>>>> http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
>>>> already looked at) and it tells you what to do with R_386_PC32 ad
>>>> R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
>>>> will remove all the symbols you don't actually need, but I don't know
>>>> that for sure. Note also that you can change the section flags of a
>>>> section marked noload  to load.
>>>>         
>>> Still think you can get away with just ADDING an offset. The image is linked to a
>>> specific address and then you move the whole image to a new address. Therefore
>>> you should be able to read the current address, add offset, write back the
>>>       
>> new address.
>>     
>> OK, I don't really get this at all....
>>
>> This code:
>>
>>     printf ("\n\n%s\n\n", version_string);
>>
>> gets compiled into:
>>
>>     380403e7:   68 a4 18 05 38   push   $0x380518a4
>>     380403ec:   68 de 2c 05 38   push   $0x38052cde
>>     380403f1:   e8 4f 84 00 00   call   38048845 <printf>
>>
>> With relocation entries in .rel.text of:
>>
>>  Offset     Info    Type            Sym.Value  Sym. Name
>>    380403e8  00016201 R_386_32          380519f0   version_string
>>    380403ed  00000201 R_386_32          380519f0   .rodata
>>    380403f2  00016b02 R_386_PC32        38048991   printf
>>
>> Now I get the first two (R_386_32) entries - Relocation involves a simple
>> addition of an offset to the values at addresses 0x380403e8 and 0x380403ed
>> (of course, these addresses will be offset)
>>
>> However, the R_386_PC32 is an enigma - The call is already relative -
>> there is no need to relocate it at all (call is a position independent
>> opcode because it is a relative jump!)
>>     
>
> Yes, but printf is defined in glibc s? the app needs to relocate the call
> to glibc.
Actually, the reason the call is relocatable is that the compiler 
DOESN'T  KNOW where printf is at all. If it is in a library, it will not 
be in the text segment and must be relocated accordingly. It may be in  
a different segment for some reason. In any case, the compiler doesn't 
know the address in the image where printf resides, so it needs a 
relocation entry to get the value filled in at link time. After the 
value is filled in, if the referenced symbol is in the same segment 
(probably .text) as the point of reference, the relocation reference is 
probably of no more use. However, there is no rule that says the linker 
must delete the reference from the relocation list.
>  U-boot has all it needs so there you should not have PC32 I think.
> Try defining a local static function. For non static functions
> you may need to define visibility=hidden and/or -Bsymbolic too.
>   
Won't help. Any symbols referenced but not defined locally are 
relocatable. After linking, they MAY, but need not, go away.
> You also need to look at the img after final linking.
>   
After linking, if the symbol is defined, the R_386_PC32 is no longer 
important UNLESS the symbol referenced is in a different segment AND the 
segments are relocated with different offsets from each other than 
originally linked. For this reason, I think the linker will not discard 
these relocations. If we are not relocating the segments with different 
relative offsets, we can ignore these relocations as the change in 
offset will come out to be zero anyway. However, if you process them 
normally, you will just add 0 and nothing will change.
>   
>> Will all R_386_PC32 be like this? Can I simply ignore them all? If so, why
>> do they even need to be generated?
>>     
>
> Hopefully you won't have any.
I think they may still be there, because we ask the linker to preserve 
relocation information. However, if the entire image is being relocated, 
not changing the order or relative offset of any segments, they can be 
ignored, because the relative values will not change. It will be 
interesting to know if they remain or if the linker drops them out. For 
references in the same segment, we can hope that they get dropped. For 
references across segments (if any), or any undefined symbols, they will 
remain.
>  Not sure about weak functions though. These might
> need PC32 relocs in some cases.
>   
There can be PC32 relocs referencing the weak symbol, but that symbol 
may be undefined.
> Also, if you look at _dl_do_reloc() in uClibc/ldso/ldso/i386/elfinterp.c I think
> you can replace symbol_addr with relocation offset.
>   
I agree, in the case you a moving the entire image and ignoring PC32 relocs.

Best Regards,
Bill Campbell
>     Jocke
>
>
>
>
>   

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-14 16:05                                             ` Joakim Tjernlund
@ 2009-10-14 16:49                                               ` J. William Campbell
  0 siblings, 0 replies; 47+ messages in thread
From: J. William Campbell @ 2009-10-14 16:49 UTC (permalink / raw)
  To: u-boot

Joakim Tjernlund wrote:
> "J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 17:35:44:
>   
>> Joakim Tjernlund wrote:
>>     
>>> "J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009 01:48:52:
>>>
>>>       
>>>> Joakim Tjernlund wrote:
>>>>
>>>>         
>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
>>>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> [Massive Snip :)]
>>>>>>>>                 
>
> [Yet another SNIP :)]
>
>   
>>>>>>> Evil idea, skip -fpic et. all and add the full reloc procedure
>>>>>>> to relocate by rewriting directly in TEXT segment. Then you save space
>>>>>>> but you need more relocation code. Something like dl_do_reloc from
>>>>>>> uClibc. Wonder how much extra code that would be? Not too much I think.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> With the following flags
>>>>>>
>>>>>> PLATFORM_RELFLAGS += -fvisibility=hidden
>>>>>> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>>>>>> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions
>>>>>>
>>>>>> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
>>>>>> this might mean I need the symbol table in the binary in order to resolve
>>>>>> them
>>>>>>
>>>>>>
>>>>>>             
>>> BTW, how many relocs do you get compared with -fPIC? I suspect you more
>>> now but hopefully not that many more.
>>>
>>>
>>>       
>>>>> Possibly, but I think you only need to add an offset to all those
>>>>> relocs.
>>>>>
>>>>>
>>>>>           
>>>> Almost right. The relocations specify a symbol value that needs to be
>>>> added to the data in memory to relocate the reference. The symbol values
>>>> involved should be the start of the text section for program references,
>>>> the start of the uninitialized data section for bss references, and the
>>>> start of the data section for initialized data and constants. So there
>>>> are about four symbols whose value you need to keep. Take a look at
>>>> http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
>>>> already looked at) and it tells you what to do with R_386_PC32 ad
>>>> R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
>>>> will remove all the symbols you don't actually need, but I don't know
>>>> that for sure. Note also that you can change the section flags of a
>>>> section marked noload  to load.
>>>>
>>>>         
>>> Still think you can get away with just ADDING an offset. The image is linked to a
>>> specific address and then you move the whole image to a new address. Therefore
>>> you should be able to read the current address, add offset, write back the
>>>       
>> new address.
>>     
>>> Normally one do what you describe but here we know that the whole img has moved so
>>> we don't have to do calculate the new address from scratch.
>>>
>>>       
>> If the addresses of the bss, text, and data segments change by the same
>> value, I think you are correct. However, if the text and data/bss
>> segments are moved by different offsets, naturally the relocations would
>> be different. One reason to retain this capability would be to allow the
>> u-boot copy to execute in place in NOR flash while re-locating the
>> read-write storage once memory has been sized. Having different
>> relocation factors is not much worse than just one, and it may be just
>> as easy to get working initially as a single relocation constant.
>>     
>
> How do figure that? You need to rewrite the insn to access the moved
> data/bss and they are in flash, did I miss something?
>   
No, I did. You are quite correct, there would be references in flash 
that couldn't be fixed. Sorry about that.

Best Regards,
Bill Campbell
>   
>> FWIW, the "ultimate" solution to minimum relocation size is a
>> post-processing step that creates "several" arrays of relocation offsets
>> as two byte quantities. This reduces the cost of each relocation entry
>> to just a bit more than two bytes (there is a small overhead for array
>> size, MSB values and relocation offset selection.) Naturally, this is
>> much less than the ELF version of the same relocations, because we do
>> not need to retain as much information and ELF doesn't worry about size
>> that much.. This may pacify users for which the flash size of the image
>> is critical, at the expense of an extra link step. Naturally, getting
>> things to work with "standard ELF" is the most important step, and
>> probably enough for most people.
>>     
>
> That would save 2+4 bytes/reloc on REL arches and
> 2+4+4 on RELA(ppc) (provided one can ignore r_addend)
>
> But yes, this is probably too "fancy" for the moment.
>
>   Jocke
>
>
>
>   

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-14 16:45                                               ` J. William Campbell
@ 2009-10-17  5:17                                                 ` Graeme Russ
  2009-10-17 12:32                                                   ` Joakim Tjernlund
  2009-10-17 12:59                                                   ` J. William Campbell
  0 siblings, 2 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-17  5:17 UTC (permalink / raw)
  To: u-boot

On Thu, Oct 15, 2009 at 3:45 AM, J. William Campbell
<jwilliamcampbell@comcast.net> wrote:
> Joakim Tjernlund wrote:
>>
>> Graeme Russ <graeme.russ@gmail.com> wrote on 14/10/2009 13:48:27:
>>
>>>
>>> On Wed, Oct 14, 2009 at 6:25 PM, Joakim Tjernlund
>>> <joakim.tjernlund@transmode.se> wrote:
>>>
>>>>
>>>> "J. William Campbell" <jwilliamcampbell@comcast.net> wrote on 14/10/2009
>>>> 01:48:52:
>>>>
>>>>>
>>>>> Joakim Tjernlund wrote:
>>>>>
>>>>>>
>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 22:06:56:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
>>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 13/10/2009 13:21:05:
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
>>>>>>>>> <joakim.tjernlund@transmode.se> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Graeme Russ <graeme.russ@gmail.com> wrote on 11/10/2009 12:47:19:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [Massive Snip :)]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So, all that is left are .dynsym and .dynamic ...
>>>>>>>>>>>  .dynsym
>>>>>>>>>>>    - Contains 70 entries (16 bytes each, 1120 bytes)
>>>>>>>>>>>    - 44 entries mimic those entries in .got which are not
>>>>>>>>>>> relocated
>>>>>>>>>>>    - 21 entries are the remaining symbols exported from the
>>>>>>>>>>> linker
>>>>>>>>>>>      script
>>>>>>>>>>>    - 4 entries are labels defined in inline asm and used in C
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Try adding proper asm declarations. Look at what gcc
>>>>>>>>>> generates for a function/variable and mimic these.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks - Now .dynsym contains only exports from the linker script
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> :)
>>>>>>>>
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    - 1 entry is a NULL entry
>>>>>>>>>>>
>>>>>>>>>>>  .dynamic
>>>>>>>>>>>    - 88 bytes
>>>>>>>>>>>    - Array of Elf32_Dyn
>>>>>>>>>>>    - typedef struct {
>>>>>>>>>>>          Elf32_Sword     d_tag;
>>>>>>>>>>>          union {
>>>>>>>>>>>              Elf32_Word  d_val;
>>>>>>>>>>>              Elf32_Addr  d_ptr;
>>>>>>>>>>>          } d_un;
>>>>>>>>>>>      } Elf32_Dyn;
>>>>>>>>>>>    - 0x11 entries
>>>>>>>>>>>      [00] 0x00000010, 0x00000000 DT_SYMBOLIC, (ignored)
>>>>>>>>>>>      [01] 0x00000004, 0x38059994 DT_HASH, points to .hash
>>>>>>>>>>>      [02] 0x00000005, 0x380595AB DT_STRTAB, points to .dynstr
>>>>>>>>>>>      [03] 0x00000006, 0x3805BDCC DT_SYMTAB, points to .dynsym
>>>>>>>>>>>      [04] 0x0000000A, 0x000003E6 DT_STRSZ, size of .dynstr
>>>>>>>>>>>      [05] 0x0000000B, 0x00000010 DT_SYMENT, ???
>>>>>>>>>>>      [06] 0x00000015, 0x00000000 DT_DEBUG, ???
>>>>>>>>>>>      [07] 0x00000011, 0x3805A8F4 DT_REL, points to .rel.text
>>>>>>>>>>>      [08] 0x00000012, 0x000014D8 DT_RELSZ, ???
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> How big DT_REL is
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>      [09] 0x00000013, 0x00000008 DT_RELENT, ???
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> hmm, cannot remeber :)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> How big an entry in DT_REL is
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Right, how could I forget :)
>>>>>>>>
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>      [0a] 0x00000016, 0x00000000 DT_TEXTREL, ???
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Oops, you got text relocations. This is generally a bad thing.
>>>>>>>>>> TEXTREL is commonly caused by asm code that arent truly pic so it
>>>>>>>>>> needs
>>>>>>>>>> to modify the .text segment to adjust for relocation.
>>>>>>>>>> You should get rid of this one. Look for DT_TEXTREL in .o files to
>>>>>>>>>> find
>>>>>>>>>> the culprit.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Alas I cannot - The relocations are a result of loading a register
>>>>>>>>> with a
>>>>>>>>> return address when calling show_boot_progress in the very early
>>>>>>>>> stages of
>>>>>>>>> initialisation prior to the stack becoming available. The x86 does
>>>>>>>>> not
>>>>>>>>> allow direct access to the IP so the only way to find the 'current
>>>>>>>>> execution address' is to 'call' to the next instruction and pop the
>>>>>>>>> return
>>>>>>>>> address off the stack
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> hmm, same as ppc but that in it self should not cause a TEXREL,
>>>>>>>> should it?
>>>>>>>> Ahh, the 'call' is absolute, not relative? I guess there is some way
>>>>>>>> around it
>>>>>>>> but it is not important ATM I guess.
>>>>>>>>
>>>>>>>> Evil idea, skip -fpic et. all and add the full reloc procedure
>>>>>>>> to relocate by rewriting directly in TEXT segment. Then you save
>>>>>>>> space
>>>>>>>> but you need more relocation code. Something like dl_do_reloc from
>>>>>>>> uClibc. Wonder how much extra code that would be? Not too much I
>>>>>>>> think.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> With the following flags
>>>>>>>
>>>>>>> PLATFORM_RELFLAGS += -fvisibility=hidden
>>>>>>> PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
>>>>>>> PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic
>>>>>>> -Bsymbolic-functions
>>>>>>>
>>>>>>> I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I
>>>>>>> think
>>>>>>> this might mean I need the symbol table in the binary in order to
>>>>>>> resolve
>>>>>>> them
>>>>>>>
>>>>>>>
>>>>
>>>> BTW, how many relocs do you get compared with -fPIC? I suspect you more
>>>> now but hopefully not that many more.
>>>>
>>>>
>>>>>>
>>>>>> Possibly, but I think you only need to add an offset to all those
>>>>>> relocs.
>>>>>>
>>>>>>
>>>>>
>>>>> Almost right. The relocations specify a symbol value that needs to be
>>>>> added to the data in memory to relocate the reference. The symbol
>>>>> values
>>>>> involved should be the start of the text section for program
>>>>> references,
>>>>> the start of the uninitialized data section for bss references, and the
>>>>> start of the data section for initialized data and constants. So there
>>>>> are about four symbols whose value you need to keep. Take a look at
>>>>> http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
>>>>> already looked at) and it tells you what to do with R_386_PC32 ad
>>>>> R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
>>>>> will remove all the symbols you don't actually need, but I don't know
>>>>> that for sure. Note also that you can change the section flags of a
>>>>> section marked noload  to load.
>>>>>
>>>>
>>>> Still think you can get away with just ADDING an offset. The image is
>>>> linked to a
>>>> specific address and then you move the whole image to a new address.
>>>> Therefore
>>>> you should be able to read the current address, add offset, write back
>>>> the
>>>>
>>>
>>> new address.
>>>    OK, I don't really get this at all....
>>>
>>> This code:
>>>
>>>    printf ("\n\n%s\n\n", version_string);
>>>
>>> gets compiled into:
>>>
>>>    380403e7:   68 a4 18 05 38   push   $0x380518a4
>>>    380403ec:   68 de 2c 05 38   push   $0x38052cde
>>>    380403f1:   e8 4f 84 00 00   call   38048845 <printf>
>>>
>>> With relocation entries in .rel.text of:
>>>
>>>  Offset     Info    Type            Sym.Value  Sym. Name
>>>   380403e8  00016201 R_386_32          380519f0   version_string
>>>   380403ed  00000201 R_386_32          380519f0   .rodata
>>>   380403f2  00016b02 R_386_PC32        38048991   printf
>>>
>>> Now I get the first two (R_386_32) entries - Relocation involves a simple
>>> addition of an offset to the values at addresses 0x380403e8 and
>>> 0x380403ed
>>> (of course, these addresses will be offset)
>>>
>>> However, the R_386_PC32 is an enigma - The call is already relative -
>>> there is no need to relocate it at all (call is a position independent
>>> opcode because it is a relative jump!)
>>>
>>
>> Yes, but printf is defined in glibc s? the app needs to relocate the call
>> to glibc.
>
> Actually, the reason the call is relocatable is that the compiler DOESN'T
>  KNOW where printf is at all. If it is in a library, it will not be in the
> text segment and must be relocated accordingly. It may be in  a different
> segment for some reason. In any case, the compiler doesn't know the address
> in the image where printf resides, so it needs a relocation entry to get the
> value filled in at link time. After the value is filled in, if the
> referenced symbol is in the same segment (probably .text) as the point of
> reference, the relocation reference is probably of no more use. However,
> there is no rule that says the linker must delete the reference from the
> relocation list.
>>
>>  U-boot has all it needs so there you should not have PC32 I think.
>> Try defining a local static function. For non static functions
>> you may need to define visibility=hidden and/or -Bsymbolic too.
>>
>
> Won't help. Any symbols referenced but not defined locally are relocatable.
> After linking, they MAY, but need not, go away.
>>
>> You also need to look at the img after final linking.
>>
>
> After linking, if the symbol is defined, the R_386_PC32 is no longer
> important UNLESS the symbol referenced is in a different segment AND the
> segments are relocated with different offsets from each other than
> originally linked. For this reason, I think the linker will not discard
> these relocations. If we are not relocating the segments with different
> relative offsets, we can ignore these relocations as the change in offset
> will come out to be zero anyway. However, if you process them normally, you
> will just add 0 and nothing will change.
>>
>>
>>>
>>> Will all R_386_PC32 be like this? Can I simply ignore them all? If so,
>>> why
>>> do they even need to be generated?
>>>
>>
>> Hopefully you won't have any.
>
> I think they may still be there, because we ask the linker to preserve
> relocation information. However, if the entire image is being relocated, not
> changing the order or relative offset of any segments, they can be ignored,
> because the relative values will not change. It will be interesting to know
> if they remain or if the linker drops them out. For references in the same
> segment, we can hope that they get dropped. For references across segments
> (if any), or any undefined symbols, they will remain.
>>
>>  Not sure about weak functions though. These might
>> need PC32 relocs in some cases.
>>
>
> There can be PC32 relocs referencing the weak symbol, but that symbol may be
> undefined.
>>
>> Also, if you look at _dl_do_reloc() in uClibc/ldso/ldso/i386/elfinterp.c I
>> think
>> you can replace symbol_addr with relocation offset.
>>
>
> I agree, in the case you a moving the entire image and ignoring PC32 relocs.
>
> Best Regards,
> Bill Campbell
>>
>>    Jocke
>>

Apologies if this is getting way off-topic for a simple boot loader, but
this is information I have gathered from far and wide over the net. I am
surprised that there isn't a web site out there on 'How to create a
relocatable boot loader'...

OK, its all starting to come together now - It helps when you look at the
right files ;)

Firstly, u-boot.map

                0x380589a0                __rel_dyn_start = .

.rel.dyn        0x380589a0     0x42b0
 *(.rel.dyn)
 .rel.got       0x00000000        0x0 cpu/i386/start.o
 .rel.plt       0x00000000        0x0 cpu/i386/start.o
 .rel.text      0x380589a0     0x2e28 cpu/i386/start.o
 .rel.start16   0x3805b7c8       0x10 cpu/i386/start.o
 .rel.data      0x3805b7d8      0xc18 cpu/i386/start.o
 .rel.rodata    0x3805c3f0      0x360 cpu/i386/start.o
 .rel.u_boot_cmd
                0x3805c750      0x500 cpu/i386/start.o
                0x3805cc50                __rel_dyn_end = .


And the output of readelf...

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        38040000 001000 0118a4 00  AX  0   0  4
  [ 2] .rel.text         REL             00000000 066c68 005d00 08     40   1  4
  [ 3] .rodata           PROGBITS        380518a4 0128a4 005da5 00   A  0   0  4
  [ 4] .rel.rodata       REL             00000000 06c968 000360 08     40   3  4
  [ 5] .interp           PROGBITS        38057649 018649 000013 00   A  0   0  1
  [ 6] .dynstr           STRTAB          3805765c 01865c 0001ee 00   A  0   0  1
  [ 7] .hash             HASH            3805784c 01884c 0000cc 04   A 11   0  4
  [ 8] .data             PROGBITS        38057918 018918 000a3c 00  WA  0   0  4
  [ 9] .rel.data         REL             00000000 06ccc8 000c18 08     40   8  4
  [10] .got.plt          PROGBITS        38058354 019354 00000c 04  WA  0   0  4
  [11] .dynsym           DYNSYM          38058360 019360 000200 10   A  6   1  4
  [12] .dynamic          DYNAMIC         38058560 019560 000080 08  WA  6   0  4
  [13] .u_boot_cmd       PROGBITS        380585e0 0195e0 0003c0 00  WA  0   0  4
  [14] .rel.u_boot_cmd   REL             00000000 06d8e0 000500 08     40  13  4
  [15] .bss              NOBITS          3805cc50 01ec50 001a34 00  WA  0   0  4
  [16] .bios             PROGBITS        00000000 01e000 00053e 00  AX  0   0  1
  [17] .rel.bios         REL             00000000 06dde0 0000c0 08     40  16  4
  [18] .rel.dyn          REL             380589a0 0199a0 0042b0 08   A 11   0  4
  [19] .start16          PROGBITS        0000f800 01e800 000110 00  AX  0   0  1
  [20] .rel.start16      REL             00000000 06dea0 000038 08     40  19  4
  [21] .resetvec         PROGBITS        0000fff0 01eff0 000010 00  AX  0   0  1
  [22] .rel.resetvec     REL             00000000 06ded8 000008 08     40  21  4

...

Relocation section '.rel.text' at offset 0x66c68 contains 2976 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
38040010  00000101 R_386_32          38040000   .text
3804001e  00000101 R_386_32          38040000   .text
38040028  00000101 R_386_32          38040000   .text
3804003f  00000101 R_386_32          38040000   .text
38040051  00000101 R_386_32          38040000   .text
38040075  00000101 R_386_32          38040000   .text
38040085  00000101 R_386_32          38040000   .text
3804009d  0003e602 R_386_PC32        380403fa   load_uboot
380400a6  00000101 R_386_32          38040000   .text
38040015  00029f02 R_386_PC32        3804bdd8   early_board_init
38040023  0003f702 R_386_PC32        3804bdda   show_boot_progress_asm

...

Relocation section '.rel.rodata' at offset 0x6c968 contains 108 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
38051908  00000201 R_386_32          380518a4   .rodata
38051938  00000201 R_386_32          380518a4   .rodata
38051968  00000201 R_386_32          380518a4   .rodata
38051998  00000201 R_386_32          380518a4   .rodata
380519c8  00000201 R_386_32          380518a4   .rodata
380519f8  00000201 R_386_32          380518a4   .rodata

...

Relocation section '.rel.dyn' at offset 0x199a0 contains 2134 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
0000f838  00000008 R_386_RELATIVE
0000f846  00000008 R_386_RELATIVE
38040010  00000008 R_386_RELATIVE
3804001e  00000008 R_386_RELATIVE
38040028  00000008 R_386_RELATIVE
3804003f  00000008 R_386_RELATIVE
38040051  00000008 R_386_RELATIVE
38040075  00000008 R_386_RELATIVE
38040085  00000008 R_386_RELATIVE

Notice that, apart from .rel.dyn, non of the .rel.* sections have the
A (Allocated) flag set - They do not end up in the stripped binary image.
.rel.dyn is allocated in the binary image with all the R_386_PC32 entries
from the other .rel section are discarded and the R_386_32 have been
'converted' to R_386_RELATIVE which are simple to adjust (locate in memory
and adjust by the relocation offset)

The relocation fixup is really easy:

	Elf32_Rel *rel_dyn_start = (Elf32_Rel *)&__rel_dyn_start;
	Elf32_Rel *rel_dyn_end = (Elf32_Rel *)&__rel_dyn_end;
	Elf32_Rel *re;

	for (re = rel_dyn_start; re < rel_dyn_end; re++)
	{
		if (re->r_offset >= TEXT_BASE)
			if (*(ulong *)re->r_offset >= TEXT_BASE)
				*(ulong *)(re->r_offset - rel_offset) -= (Elf32_Addr)rel_offset;
	}

The size penalty is ~17kB of extra data (which is not copied to RAM) and
a tiny amount of relocation code (easily offset by removal of other fixups
such as the command table fixup

Any without using the pic flag in gcc, there is no GOT and no associated
performance penalty.

Thanks for everyone's help (especially Jocke and Bill)

Regards,

Graeme

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-17  5:17                                                 ` Graeme Russ
@ 2009-10-17 12:32                                                   ` Joakim Tjernlund
  2009-10-17 12:59                                                   ` J. William Campbell
  1 sibling, 0 replies; 47+ messages in thread
From: Joakim Tjernlund @ 2009-10-17 12:32 UTC (permalink / raw)
  To: u-boot

Graeme Russ <graeme.russ@gmail.com> wrote on 17/10/2009 07:17:04:
>

[SNIP]

>
> Apologies if this is getting way off-topic for a simple boot loader, but
> this is information I have gathered from far and wide over the net. I am
> surprised that there isn't a web site out there on 'How to create a
> relocatable boot loader'...

:), now you can write one :)

>
> OK, its all starting to come together now - It helps when you look at the
> right files ;)
>
> Firstly, u-boot.map
>
>                 0x380589a0                __rel_dyn_start = .
>
> .rel.dyn        0x380589a0     0x42b0
>  *(.rel.dyn)
>  .rel.got       0x00000000        0x0 cpu/i386/start.o
>  .rel.plt       0x00000000        0x0 cpu/i386/start.o
>  .rel.text      0x380589a0     0x2e28 cpu/i386/start.o
>  .rel.start16   0x3805b7c8       0x10 cpu/i386/start.o
>  .rel.data      0x3805b7d8      0xc18 cpu/i386/start.o
>  .rel.rodata    0x3805c3f0      0x360 cpu/i386/start.o
>  .rel.u_boot_cmd
>                 0x3805c750      0x500 cpu/i386/start.o
>                 0x3805cc50                __rel_dyn_end = .
>
>
> And the output of readelf...
>
> Section Headers:
>   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
>   [ 0]                   NULL            00000000 000000 000000 00      0   0  0
>   [ 1] .text             PROGBITS        38040000 001000 0118a4 00  AX  0   0  4
>   [ 2] .rel.text         REL             00000000 066c68 005d00 08     40   1  4
>   [ 3] .rodata           PROGBITS        380518a4 0128a4 005da5 00   A  0   0  4
>   [ 4] .rel.rodata       REL             00000000 06c968 000360 08     40   3  4
>   [ 5] .interp           PROGBITS        38057649 018649 000013 00   A  0   0  1
>   [ 6] .dynstr           STRTAB          3805765c 01865c 0001ee 00   A  0   0  1
>   [ 7] .hash             HASH            3805784c 01884c 0000cc 04   A 11   0  4
>   [ 8] .data             PROGBITS        38057918 018918 000a3c 00  WA  0   0  4
>   [ 9] .rel.data         REL             00000000 06ccc8 000c18 08     40   8  4
>   [10] .got.plt          PROGBITS        38058354 019354 00000c 04  WA  0   0  4
>   [11] .dynsym           DYNSYM          38058360 019360 000200 10   A  6   1  4
>   [12] .dynamic          DYNAMIC         38058560 019560 000080 08  WA  6   0  4
>   [13] .u_boot_cmd       PROGBITS        380585e0 0195e0 0003c0 00  WA  0   0  4
>   [14] .rel.u_boot_cmd   REL             00000000 06d8e0 000500 08     40  13  4
>   [15] .bss              NOBITS          3805cc50 01ec50 001a34 00  WA  0   0  4
>   [16] .bios             PROGBITS        00000000 01e000 00053e 00  AX  0   0  1
>   [17] .rel.bios         REL             00000000 06dde0 0000c0 08     40  16  4
>   [18] .rel.dyn          REL             380589a0 0199a0 0042b0 08   A 11   0  4
>   [19] .start16          PROGBITS        0000f800 01e800 000110 00  AX  0   0  1
>   [20] .rel.start16      REL             00000000 06dea0 000038 08     40  19  4
>   [21] .resetvec         PROGBITS        0000fff0 01eff0 000010 00  AX  0   0  1
>   [22] .rel.resetvec     REL             00000000 06ded8 000008 08     40  21  4
>
> ...
>
> Relocation section '.rel.text' at offset 0x66c68 contains 2976 entries:
>  Offset     Info    Type            Sym.Value  Sym. Name
> 38040010  00000101 R_386_32          38040000   .text
> 3804001e  00000101 R_386_32          38040000   .text
> 38040028  00000101 R_386_32          38040000   .text
> 3804003f  00000101 R_386_32          38040000   .text
> 38040051  00000101 R_386_32          38040000   .text
> 38040075  00000101 R_386_32          38040000   .text
> 38040085  00000101 R_386_32          38040000   .text
> 3804009d  0003e602 R_386_PC32        380403fa   load_uboot
> 380400a6  00000101 R_386_32          38040000   .text
> 38040015  00029f02 R_386_PC32        3804bdd8   early_board_init
> 38040023  0003f702 R_386_PC32        3804bdda   show_boot_progress_asm
>
> ...
>
> Relocation section '.rel.rodata' at offset 0x6c968 contains 108 entries:
>  Offset     Info    Type            Sym.Value  Sym. Name
> 38051908  00000201 R_386_32          380518a4   .rodata
> 38051938  00000201 R_386_32          380518a4   .rodata
> 38051968  00000201 R_386_32          380518a4   .rodata
> 38051998  00000201 R_386_32          380518a4   .rodata
> 380519c8  00000201 R_386_32          380518a4   .rodata
> 380519f8  00000201 R_386_32          380518a4   .rodata
>
> ...
>
> Relocation section '.rel.dyn' at offset 0x199a0 contains 2134 entries:
>  Offset     Info    Type            Sym.Value  Sym. Name
> 0000f838  00000008 R_386_RELATIVE
> 0000f846  00000008 R_386_RELATIVE
> 38040010  00000008 R_386_RELATIVE
> 3804001e  00000008 R_386_RELATIVE
> 38040028  00000008 R_386_RELATIVE
> 3804003f  00000008 R_386_RELATIVE
> 38040051  00000008 R_386_RELATIVE
> 38040075  00000008 R_386_RELATIVE
> 38040085  00000008 R_386_RELATIVE
>
> Notice that, apart from .rel.dyn, non of the .rel.* sections have the
> A (Allocated) flag set - They do not end up in the stripped binary image.
> .rel.dyn is allocated in the binary image with all the R_386_PC32 entries
> from the other .rel section are discarded and the R_386_32 have been
> 'converted' to R_386_RELATIVE which are simple to adjust (locate in memory
> and adjust by the relocation offset)

Ah, they are converted to relative. Wonder if all archs do this?
If so one only will need two reloc functions, one for Rel and
one for Rela relocs.

>
> The relocation fixup is really easy:
>
>    Elf32_Rel *rel_dyn_start = (Elf32_Rel *)&__rel_dyn_start;
>    Elf32_Rel *rel_dyn_end = (Elf32_Rel *)&__rel_dyn_end;
>    Elf32_Rel *re;
>
>    for (re = rel_dyn_start; re < rel_dyn_end; re++)
>    {
>       if (re->r_offset >= TEXT_BASE)
>          if (*(ulong *)re->r_offset >= TEXT_BASE)
>             *(ulong *)(re->r_offset - rel_offset) -= (Elf32_Addr)rel_offset;
>    }

Do you need the TEXT_BASE stuff or is it just a precaution?
Not sure if you need some test for NULL to handle weak undefined symbols though.

> The size penalty is ~17kB of extra data (which is not copied to RAM) and
> a tiny amount of relocation code (easily offset by removal of other fixups
> such as the command table fixup

17kB, how does that compare to the -fPIC version?

>
> Any without using the pic flag in gcc, there is no GOT and no associated
> performance penalty.

Yep :)

>
> Thanks for everyone's help (especially Jocke and Bill)

NP, will we see a patch soon?

 Jocke

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-17  5:17                                                 ` Graeme Russ
  2009-10-17 12:32                                                   ` Joakim Tjernlund
@ 2009-10-17 12:59                                                   ` J. William Campbell
  2009-10-17 21:29                                                     ` Graeme Russ
  1 sibling, 1 reply; 47+ messages in thread
From: J. William Campbell @ 2009-10-17 12:59 UTC (permalink / raw)
  To: u-boot

Graeme Russ wrote:
> On Thu, Oct 15, 2009 at 3:45 AM, J. William Campbell
> <jwilliamcampbell@comcast.net> wrote:
>   
>> Joakim Tjernlund wrote:
>>     
>   
<megasnip>

> Apologies if this is getting way off-topic for a simple boot loader, but
> this is information I have gathered from far and wide over the net. I am
> surprised that there isn't a web site out there on 'How to create a
> relocatable boot loader'...
>
> OK, its all starting to come together now - It helps when you look at the
> right files ;)
>
> Firstly, u-boot.map
>
>                 0x380589a0                __rel_dyn_start = .
>
> .rel.dyn        0x380589a0     0x42b0
>  *(.rel.dyn)
>  .rel.got       0x00000000        0x0 cpu/i386/start.o
>  .rel.plt       0x00000000        0x0 cpu/i386/start.o
>  .rel.text      0x380589a0     0x2e28 cpu/i386/start.o
>  .rel.start16   0x3805b7c8       0x10 cpu/i386/start.o
>  .rel.data      0x3805b7d8      0xc18 cpu/i386/start.o
>  .rel.rodata    0x3805c3f0      0x360 cpu/i386/start.o
>  .rel.u_boot_cmd
>                 0x3805c750      0x500 cpu/i386/start.o
>                 0x3805cc50                __rel_dyn_end = .
>
>
> And the output of readelf...
>
> Section Headers:
>   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
>   [ 0]                   NULL            00000000 000000 000000 00      0   0  0
>   [ 1] .text             PROGBITS        38040000 001000 0118a4 00  AX  0   0  4
>   [ 2] .rel.text         REL             00000000 066c68 005d00 08     40   1  4
>   [ 3] .rodata           PROGBITS        380518a4 0128a4 005da5 00   A  0   0  4
>   [ 4] .rel.rodata       REL             00000000 06c968 000360 08     40   3  4
>   [ 5] .interp           PROGBITS        38057649 018649 000013 00   A  0   0  1
>   [ 6] .dynstr           STRTAB          3805765c 01865c 0001ee 00   A  0   0  1
>   [ 7] .hash             HASH            3805784c 01884c 0000cc 04   A 11   0  4
>   [ 8] .data             PROGBITS        38057918 018918 000a3c 00  WA  0   0  4
>   [ 9] .rel.data         REL             00000000 06ccc8 000c18 08     40   8  4
>   [10] .got.plt          PROGBITS        38058354 019354 00000c 04  WA  0   0  4
>   [11] .dynsym           DYNSYM          38058360 019360 000200 10   A  6   1  4
>   [12] .dynamic          DYNAMIC         38058560 019560 000080 08  WA  6   0  4
>   [13] .u_boot_cmd       PROGBITS        380585e0 0195e0 0003c0 00  WA  0   0  4
>   [14] .rel.u_boot_cmd   REL             00000000 06d8e0 000500 08     40  13  4
>   [15] .bss              NOBITS          3805cc50 01ec50 001a34 00  WA  0   0  4
>   [16] .bios             PROGBITS        00000000 01e000 00053e 00  AX  0   0  1
>   [17] .rel.bios         REL             00000000 06dde0 0000c0 08     40  16  4
>   [18] .rel.dyn          REL             380589a0 0199a0 0042b0 08   A 11   0  4
>   [19] .start16          PROGBITS        0000f800 01e800 000110 00  AX  0   0  1
>   [20] .rel.start16      REL             00000000 06dea0 000038 08     40  19  4
>   [21] .resetvec         PROGBITS        0000fff0 01eff0 000010 00  AX  0   0  1
>   [22] .rel.resetvec     REL             00000000 06ded8 000008 08     40  21  4
>
> ...
>
> Relocation section '.rel.text' at offset 0x66c68 contains 2976 entries:
>  Offset     Info    Type            Sym.Value  Sym. Name
> 38040010  00000101 R_386_32          38040000   .text
> 3804001e  00000101 R_386_32          38040000   .text
> 38040028  00000101 R_386_32          38040000   .text
> 3804003f  00000101 R_386_32          38040000   .text
> 38040051  00000101 R_386_32          38040000   .text
> 38040075  00000101 R_386_32          38040000   .text
> 38040085  00000101 R_386_32          38040000   .text
> 3804009d  0003e602 R_386_PC32        380403fa   load_uboot
> 380400a6  00000101 R_386_32          38040000   .text
> 38040015  00029f02 R_386_PC32        3804bdd8   early_board_init
> 38040023  0003f702 R_386_PC32        3804bdda   show_boot_progress_asm
>
> ...
>
> Relocation section '.rel.rodata' at offset 0x6c968 contains 108 entries:
>  Offset     Info    Type            Sym.Value  Sym. Name
> 38051908  00000201 R_386_32          380518a4   .rodata
> 38051938  00000201 R_386_32          380518a4   .rodata
> 38051968  00000201 R_386_32          380518a4   .rodata
> 38051998  00000201 R_386_32          380518a4   .rodata
> 380519c8  00000201 R_386_32          380518a4   .rodata
> 380519f8  00000201 R_386_32          380518a4   .rodata
>
> ...
>
> Relocation section '.rel.dyn' at offset 0x199a0 contains 2134 entries:
>  Offset     Info    Type            Sym.Value  Sym. Name
> 0000f838  00000008 R_386_RELATIVE
> 0000f846  00000008 R_386_RELATIVE
> 38040010  00000008 R_386_RELATIVE
> 3804001e  00000008 R_386_RELATIVE
> 38040028  00000008 R_386_RELATIVE
> 3804003f  00000008 R_386_RELATIVE
> 38040051  00000008 R_386_RELATIVE
> 38040075  00000008 R_386_RELATIVE
> 38040085  00000008 R_386_RELATIVE
>
> Notice that, apart from .rel.dyn, non of the .rel.* sections have the
> A (Allocated) flag set - They do not end up in the stripped binary image.
> .rel.dyn is allocated in the binary image with all the R_386_PC32 entries
> from the other .rel section are discarded and the R_386_32 have been
> 'converted' to R_386_RELATIVE which are simple to adjust (locate in memory
> and adjust by the relocation offset)
>
> The relocation fixup is really easy:
>
> 	Elf32_Rel *rel_dyn_start = (Elf32_Rel *)&__rel_dyn_start;
> 	Elf32_Rel *rel_dyn_end = (Elf32_Rel *)&__rel_dyn_end;
> 	Elf32_Rel *re;
>
> 	for (re = rel_dyn_start; re < rel_dyn_end; re++)
> 	{
> 		if (re->r_offset >= TEXT_BASE)
> 			if (*(ulong *)re->r_offset >= TEXT_BASE)
> 				*(ulong *)(re->r_offset - rel_offset) -= (Elf32_Addr)rel_offset;
> 	}
>
> The size penalty is ~17kB of extra data (which is not copied to RAM) and
> a tiny amount of relocation code (easily offset by removal of other fixups
> such as the command table fixup
>
> Any without using the pic flag in gcc, there is no GOT and no associated
> performance penalty.
>
> Thanks for everyone's help (especially Jocke and Bill)
>   
Great work Graeme. You have taken a lot of conjecture and guessing and 
converted it to actual truth!

In line with your comment about -fpic, the .text segment size goes from 
000137fc down to 000118a4, or about an 8 k reduction in size. -fpic also 
contains a .rel_dyn segment, that presumably needs to be processed the 
same way as in the non -fpic case (otherwise, why would it be there?). 
The size of the "residual" .rel_dyn was 00001228, or 4.6 k. This means 
that the size penalty for not using -fpic is only about 3k bytes total 
in the image, and the ram footprint is actually smaller than with -fpic. 
So now, after Graeme's work here, it is easily possible to support three 
different u-boot configurations, absolute, relocatable, and relocatable 
with -fpic. If there are any size maniacs out there, we can reduce the 
size of the relocation table at the expense of some post-processing. 
These days, 9k of flash vs 4.5k of flash doesn't seem important, but I 
imagine if you are right against the stops on an existing product it can 
be very important!

It will be interesting to see similar numbers for other architectures. I 
expect similar results, but you never know. PPC relocation entries are 
larger, so they become more of an issue.

Still more questions for Graeme if he will indulge me! Are the if 
statements in the relocation code ever false? Are there relocations for 
stuff below TEXT_BASE in
the input binary? If so, do you have any idea why? Not that two if 
statements are a big deal, it is just that I can't explain why there 
would be any relocations below TEXT_BASE, and I can't explain why there 
would be any relocatable references to anything below text base. . I 
assume this might be related to not relocating NULL pointers. That would 
be reflected in the innermost if statement. I would not expect there to 
be any such references, as gas does know the relocation attributes of 
initialized data, and NULL is absolute(?)  Also, if a function is not 
defined (weak or otherwise), the loader should give it an address of 
absolute 0, which would also not generate a relocation entry(?).  It 
would be interesting to intentionally call an un-defined function in 
u-boot and see if the call ends up relocatable. It should not, and if it 
does we should file a bug report for ld!

 Thanks again Graeme!

Best Regards,
Bill Campbell
> Regards,
>
> Graeme
>
>   

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [U-Boot] Relocation size penalty calculation
  2009-10-17 12:59                                                   ` J. William Campbell
@ 2009-10-17 21:29                                                     ` Graeme Russ
  0 siblings, 0 replies; 47+ messages in thread
From: Graeme Russ @ 2009-10-17 21:29 UTC (permalink / raw)
  To: u-boot

On Sat, Oct 17, 2009 at 11:59 PM, J. William Campbell
<jwilliamcampbell@comcast.net> wrote:
> Graeme Russ wrote:
>>
>> On Thu, Oct 15, 2009 at 3:45 AM, J. William Campbell
>> <jwilliamcampbell@comcast.net> wrote:
>>
>>>
>>> Joakim Tjernlund wrote:
>>>
>>
>>
>
> <megasnip>
>

[Yawn... YAS (Yet Another Snip) ;)]

>>
>> The relocation fixup is really easy:
>>
>>        Elf32_Rel *rel_dyn_start = (Elf32_Rel *)&__rel_dyn_start;
>>        Elf32_Rel *rel_dyn_end = (Elf32_Rel *)&__rel_dyn_end;
>>        Elf32_Rel *re;
>>
>>        for (re = rel_dyn_start; re < rel_dyn_end; re++)
>>        {
>>                if (re->r_offset >= TEXT_BASE)
>>                        if (*(ulong *)re->r_offset >= TEXT_BASE)
>>                                *(ulong *)(re->r_offset - rel_offset) -=
>> (Elf32_Addr)rel_offset;
>>        }
>>
>> The size penalty is ~17kB of extra data (which is not copied to RAM) and
>> a tiny amount of relocation code (easily offset by removal of other fixups
>> such as the command table fixup
>>
>> Any without using the pic flag in gcc, there is no GOT and no associated
>> performance penalty.
>>
>> Thanks for everyone's help (especially Jocke and Bill)
>>
>
> Great work Graeme. You have taken a lot of conjecture and guessing and
> converted it to actual truth!
>
> In line with your comment about -fpic, the .text segment size goes from
> 000137fc down to 000118a4, or about an 8 k reduction in size. -fpic also
> contains a .rel_dyn segment, that presumably needs to be processed the same
> way as in the non -fpic case (otherwise, why would it be there?). The size
> of the "residual" .rel_dyn was 00001228, or 4.6 k. This means that the size
> penalty for not using -fpic is only about 3k bytes total in the image, and
> the ram footprint is actually smaller than with -fpic. So now, after

Yes, especially on the x86 because with -fpic, the x86 needs to do a CALL/POP
in the beginning of each function to determine the current IP in order to
calculate absolute addresses using the GOT (ouch!)

> Graeme's work here, it is easily possible to support three different u-boot
> configurations, absolute, relocatable, and relocatable with -fpic. If there
> are any size maniacs out there, we can reduce the size of the relocation
> table at the expense of some post-processing. These days, 9k of flash vs
> 4.5k of flash doesn't seem important, but I imagine if you are right against
> the stops on an existing product it can be very important!
>
> It will be interesting to see similar numbers for other architectures. I
> expect similar results, but you never know. PPC relocation entries are
> larger, so they become more of an issue.
>
> Still more questions for Graeme if he will indulge me! Are the if statements
> in the relocation code ever false? Are there relocations for stuff below
> TEXT_BASE in
> the input binary? If so, do you have any idea why? Not that two if
> statements are a big deal, it is just that I can't explain why there would
> be any relocations below TEXT_BASE, and I can't explain why there would be
> any relocatable references to anything below text base. . I assume this
> might be related to not relocating NULL pointers. That would be reflected in
> the innermost if statement. I would not expect there to be any such
> references, as gas does know the relocation attributes of initialized data,
> and NULL is absolute(?)  Also, if a function is not defined (weak or
> otherwise), the loader should give it an address of absolute 0, which would
> also not generate a relocation entry(?).  It would be interesting to
> intentionally call an un-defined function in u-boot and see if the call ends
> up relocatable. It should not, and if it does we should file a bug report
> for ld!

Apart from NULL pointers, there are some peculiarities for x86 that have
to be dealt with. There are two sections (for BIOS and the real mode
trampoline) which get linked at a hard coded memory location in the low
are of memory (<16M) - The TEXT_BASE checks are to ensure these do not get
trampled.

>
> Thanks again Graeme!
>

NP - Just scratching an itch

> Best Regards,
> Bill Campbell
>>
>> Regards,
>>
>> Graeme
>>
>>
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2009-10-17 21:29 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-08 11:54 [U-Boot] Relocation size penalty calculation Graeme Russ
2009-10-08 14:14 ` Peter Tyser
2009-10-08 15:53   ` J. William Campbell
2009-10-08 16:15     ` Peter Tyser
2009-10-08 16:50       ` J. William Campbell
2009-10-08 15:58 ` J. William Campbell
2009-10-08 20:58   ` Graeme Russ
2009-10-08 21:23     ` Wolfgang Denk
2009-10-08 22:02       ` Graeme Russ
2009-10-08 22:20         ` Peter Tyser
2009-10-09  1:25           ` Mike Frysinger
2009-10-09  1:43           ` Graeme Russ
2009-10-08 22:27     ` J. William Campbell
2009-10-08 22:39       ` Graeme Russ
2009-10-08 23:12         ` Joakim Tjernlund
2009-10-09  0:09           ` J. William Campbell
2009-10-10  4:43           ` Graeme Russ
2009-10-10  8:07             ` Joakim Tjernlund
2009-10-10  8:46               ` Graeme Russ
2009-10-10  9:27                 ` Joakim Tjernlund
2009-10-10 10:38                   ` Graeme Russ
2009-10-10 10:47                     ` Joakim Tjernlund
2009-10-10 11:21                       ` Graeme Russ
2009-10-10 15:38                         ` Joakim Tjernlund
2009-10-11 10:47                           ` Graeme Russ
     [not found]                             ` <OF83D1271F.04B67606-ONC125764C.0045BFF2-C125764C.0046AC45@transmode.se>
2009-10-13 11:21                               ` Graeme Russ
2009-10-13 11:53                                 ` Joakim Tjernlund
2009-10-13 16:30                                   ` J. William Campbell
2009-10-13 16:55                                     ` Joakim Tjernlund
2009-10-13 20:06                                   ` Graeme Russ
     [not found]                                     ` <OF32A18F38.511FF11C-ONC125764E.00750716-C125764E.007534EE@ <4AD511E4.9090204@comcast.net>
2009-10-13 21:20                                     ` Joakim Tjernlund
2009-10-13 23:48                                       ` J. William Campbell
2009-10-14  7:25                                         ` Joakim Tjernlund
2009-10-14 11:48                                           ` Graeme Russ
2009-10-14 12:38                                             ` Joakim Tjernlund
2009-10-14 16:45                                               ` J. William Campbell
2009-10-17  5:17                                                 ` Graeme Russ
2009-10-17 12:32                                                   ` Joakim Tjernlund
2009-10-17 12:59                                                   ` J. William Campbell
2009-10-17 21:29                                                     ` Graeme Russ
2009-10-14 15:35                                           ` J. William Campbell
2009-10-14 16:05                                             ` Joakim Tjernlund
2009-10-14 16:49                                               ` J. William Campbell
     [not found]                         ` <4AD0B3D7.7020900@comcast.net>
2009-10-11  1:31                           ` Graeme Russ
2009-10-10 16:52                       ` Mike Frysinger
2009-10-10 17:45                         ` Joakim Tjernlund
2009-10-11  0:43                           ` Graeme Russ

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox