[PATCH] [POWERPC] Optimize counting distinct entries in the relocation sections

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] [POWERPC] Optimize counting distinct entries in the relocation sections
@ 2007-11-08 23:36 Emil Medve
  2007-11-12  6:00 ` Paul Mackerras
  0 siblings, 1 reply; 8+ messages in thread
From: Emil Medve @ 2007-11-08 23:36 UTC (permalink / raw)
  To: paulus, rusty, ntl, sfr, behlendorf1, galak, linuxppc-dev,
	linuxppc-embedded
  Cc: Emil Medve

When a module has relocation sections with tens of thousands worth of entries,
counting the distinct/unique entries only (i.e. no duplicates) at load time can
take tens of seconds and up to minutes. The sore point is the count_relocs()
function which is called as part of the architecture specific module loading
processing path:

	-> load_module()			generic
	   -> module_frob_arch_sections()	arch specific
	      -> get_plt_size()		32-bit
	      -> get_stubs_size()	64-bit
		 -> count_relocs()

(Not sure why the relocation tables could contain lots of duplicates and why
they are not trimmed at compile time by the linker. In some test cases, out of
35K relocation entries only 1.5K were distinct/unique)

The previous counting algorithm was having O(n^2) complexity. Basically two
solutions were proposed on the e-mail list: a hash based approach and a sort
based approach

The hash based approach is the fastest (O(n)) but the has it needs additional
memory and for certain corner cases it could take lots of memory due to the
degeneration of the hash. One such proposal was submitted here:

http://ozlabs.org/pipermail/linuxppc-dev/2007-June/037641.html

In this proposal, the symbol + addendum are hashed to generate a key and a
pointer to the relocation entry will be stored in it. The hash is implemented as
a linked list of memory pages with PAGE_SIZE / sizeof(Elfxx_Rela *) entries. In
case of collisions in all the existing pages, a new page is added to the list to
accommodate the new distinct relocation entry

For 32-bit PowerPCs with 4K pages, a page can accommodate 1K worth of pointers
to relocation entries. In the 35K entries scenario, as much/little of six (6)
pages could be allocated using 24K of extra memory during the module load

The sort based approach is slower (O(n * log n + n)) but if the sorting is done
"in place" it doesn't need additional memory. A proposal was submitted here:

http://ozlabs.org/pipermail/linuxppc-dev/2007-November/045854.html
(http://patchwork.ozlabs.org/linuxppc/patch?filter=default&id=14573)

In this proposal an array of pointers to the relocation entries is built and
then is sorted using the kernel sort() utility function. This is basically a heap
sort algorithm with a stable O(n * log n) complexity. With this counting the
distinct/unique entries is just linear (O(n)) complexity. The problem is the
extra memory needed in this proposal, which in the 35K relocation entries test
case it can be as much as 140K (for 32-bit PowerPCs; double for 64-bit). This is
much more then the memory needed by the hash based approach described
above/earlier but it doesn't hide potential degenerative corner cases

The current patch is a happy compromise between the two proposals above:
O(n + n * log n) performance with no additional memory requirements due to
sorting in place the relocation table itself

Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
---

The style errors below are false positives because checkpatch understands the
Elfxx_Rela as operands instead of types

powerpc> scripts/checkpatch.pl 0001-POWERPC-Optimize-counting-distinct-entries-in-the.patch 
ERROR: need consistent spacing around '*' (ctx:WxV)
#84: FILE: arch/powerpc/kernel/module_32.c:56:
+static uint32_t count_relocs(const Elf32_Rela *rela, uint32_t num)
                                               ^

ERROR: need consistent spacing around '*' (ctx:WxV)
#118: FILE: arch/powerpc/kernel/module_32.c:77:
+       const Elf32_Rela *x, *y;
                         ^

ERROR: need consistent spacing around '*' (ctx:WxV)
#190: FILE: arch/powerpc/kernel/module_64.c:83:
+static uint64_t count_relocs(const Elf64_Rela *rela, uint64_t num)
                                               ^

ERROR: need consistent spacing around '*' (ctx:WxV)
#235: FILE: arch/powerpc/kernel/module_64.c:124:
+       const Elf64_Rela *x, *y;
                         ^

total: 4 errors, 0 warnings, 0 checks, 225 lines checked
Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

 arch/powerpc/kernel/module_32.c |   78 +++++++++++++++++++++++++++++-------
 arch/powerpc/kernel/module_64.c |   85 ++++++++++++++++++++++++++++++--------
 2 files changed, 130 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/kernel/module_32.c b/arch/powerpc/kernel/module_32.c
index 07a89a3..866637a 100644
--- a/arch/powerpc/kernel/module_32.c
+++ b/arch/powerpc/kernel/module_32.c
@@ -24,6 +24,7 @@
 #include <linux/kernel.h>
 #include <linux/cache.h>
 #include <linux/bug.h>
+#include <linux/sort.h>
 
 #include "setup.h"
 
@@ -52,24 +53,61 @@ void module_free(struct module *mod, void *module_region)
 
 /* Count how many different relocations (different symbol, different
    addend) */
-static unsigned int count_relocs(const Elf32_Rela *rela, unsigned int num)
+static uint32_t count_relocs(const Elf32_Rela *rela, uint32_t num)
 {
-	unsigned int i, j, ret = 0;
-
-	/* Sure, this is order(n^2), but it's usually short, and not
-           time critical */
-	for (i = 0; i < num; i++) {
-		for (j = 0; j < i; j++) {
-			/* If this addend appeared before, it's
-                           already been counted */
-			if (ELF32_R_SYM(rela[i].r_info)
-			    == ELF32_R_SYM(rela[j].r_info)
-			    && rela[i].r_addend == rela[j].r_addend)
-				break;
+	int i;
+	uint32_t r_info, r_addend, _count_relocs;
+
+	_count_relocs = 0;
+	r_info = 0;
+	r_addend = 0;
+	for (i = 0; i < num; i++)
+		if (r_info != ELF32_R_SYM(rela[i].r_info) ||
+		   r_addend != rela[i].r_addend) {
+			_count_relocs++;
+			r_info = ELF32_R_SYM(rela[i].r_info);
+			r_addend = rela[i].r_addend;
 		}
-		if (j == i) ret++;
+
+	return _count_relocs;
+}
+
+static int relacmp(const void *_x, const void *_y)
+{
+	const Elf32_Rela *x, *y;
+
+	y = (Elf32_Rela *)_x;
+	x = (Elf32_Rela *)_y;
+
+	/* Compare the entire r_info (as opposed to ELF32_R_SYM(r_info) only) to
+	 * make the comparison cheaper/faster. It won't affect the sorting or
+	 * the counting algorithms' performance
+	 */
+	if (x->r_info < y->r_info)
+		return -1;
+	else if (x->r_info > y->r_info)
+		return 1;
+	else if (x->r_addend < y->r_addend)
+		return -1;
+	else if (x->r_addend > y->r_addend)
+		return 1;
+	else
+		return 0;
+}
+
+static void relaswap(void *_x, void *_y, int size)
+{
+	uint32_t *x, *y, tmp;
+	int i;
+
+	y = (uint32_t *)_x;
+	x = (uint32_t *)_y;
+
+	for (i = 0; i < sizeof(Elf32_Rela) / sizeof(uint32_t); i++) {
+		tmp = x[i];
+		x[i] = y[i];
+		y[i] = tmp;
 	}
-	return ret;
 }
 
 /* Get the potential trampolines size required of the init and
@@ -100,6 +138,16 @@ static unsigned long get_plt_size(const Elf32_Ehdr *hdr,
 			DEBUGP("Ptr: %p.  Number: %u\n",
 			       (void *)hdr + sechdrs[i].sh_offset,
 			       sechdrs[i].sh_size / sizeof(Elf32_Rela));
+
+			/* Sort the relocation information based on a symbol and
+			 * addend key. This is a stable O(n*log n) complexity
+			 * alogrithm but it will reduce the complexity of
+			 * count_relocs() to linear complexity O(n)
+			 */
+			sort((void *)hdr + sechdrs[i].sh_offset,
+			     sechdrs[i].sh_size / sizeof(Elf32_Rela),
+			     sizeof(Elf32_Rela), relacmp, relaswap);
+
 			ret += count_relocs((void *)hdr
 					     + sechdrs[i].sh_offset,
 					     sechdrs[i].sh_size
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 75c7c4f..2ffe43f 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -24,6 +24,7 @@
 #include <asm/module.h>
 #include <asm/uaccess.h>
 #include <asm/firmware.h>
+#include <linux/sort.h>
 
 #include "setup.h"
 
@@ -79,27 +80,27 @@ static struct ppc64_stub_entry ppc64_stub =
 
 /* Count how many different 24-bit relocations (different symbol,
    different addend) */
-static unsigned int count_relocs(const Elf64_Rela *rela, unsigned int num)
+static uint64_t count_relocs(const Elf64_Rela *rela, uint64_t num)
 {
-	unsigned int i, j, ret = 0;
+	int i;
+	uint64_t r_info, r_addend, _count_relocs;
 
 	/* FIXME: Only count external ones --RR */
-	/* Sure, this is order(n^2), but it's usually short, and not
-           time critical */
-	for (i = 0; i < num; i++) {
+	_count_relocs = 0;
+	r_info = 0;
+	r_addend = 0;
+	for (i = 0; i < num; i++)
 		/* Only count 24-bit relocs, others don't need stubs */
-		if (ELF64_R_TYPE(rela[i].r_info) != R_PPC_REL24)
-			continue;
-		for (j = 0; j < i; j++) {
-			/* If this addend appeared before, it's
-                           already been counted */
-			if (rela[i].r_info == rela[j].r_info
-			    && rela[i].r_addend == rela[j].r_addend)
-				break;
+		if (ELF64_R_TYPE(rela[i].r_info) == R_PPC_REL24) {
+			if (r_info != ELF64_R_SYM(rela[i].r_info) ||
+			   r_addend != rela[i].r_addend) {
+				_count_relocs++;
+				r_info = ELF64_R_SYM(rela[i].r_info);
+				r_addend = rela[i].r_addend;
+			}
 		}
-		if (j == i) ret++;
-	}
-	return ret;
+
+	return _count_relocs;
 }
 
 void *module_alloc(unsigned long size)
@@ -118,6 +119,44 @@ void module_free(struct module *mod, void *module_region)
            table entries. */
 }
 
+static int relacmp(const void *_x, const void *_y)
+{
+	const Elf64_Rela *x, *y;
+
+	y = (Elf64_Rela *)_x;
+	x = (Elf64_Rela *)_y;
+
+	/* Compare the entire r_info (as opposed to ELF64_R_SYM(r_info) only) to
+	 * make the comparison cheaper/faster. It won't affect the sorting or
+	 * the counting algorithms' performance
+	 */
+	if (x->r_info < y->r_info)
+		return -1;
+	else if (x->r_info > y->r_info)
+		return 1;
+	else if (x->r_addend < y->r_addend)
+		return -1;
+	else if (x->r_addend > y->r_addend)
+		return 1;
+	else
+		return 0;
+}
+
+static void relaswap(void *_x, void *_y, int size)
+{
+	uint64_t *x, *y, tmp;
+	int i;
+
+	y = (uint64_t *)_x;
+	x = (uint64_t *)_y;
+
+	for (i = 0; i < sizeof(Elf64_Rela) / sizeof(uint64_t); i++) {
+		tmp = x[i];
+		x[i] = y[i];
+		y[i] = tmp;
+	}
+}
+
 /* Get size of potential trampolines required. */
 static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
 				    const Elf64_Shdr *sechdrs)
@@ -133,6 +172,16 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
 			DEBUGP("Ptr: %p.  Number: %lu\n",
 			       (void *)sechdrs[i].sh_addr,
 			       sechdrs[i].sh_size / sizeof(Elf64_Rela));
+
+			/* Sort the relocation information based on a symbol and
+			 * addend key. This is a stable O(n*log n) complexity
+			 * alogrithm but it will reduce the complexity of
+			 * count_relocs() to linear complexity O(n)
+			 */
+			sort((void *)sechdrs[i].sh_addr,
+			     sechdrs[i].sh_size / sizeof(Elf64_Rela),
+			     sizeof(Elf64_Rela), relacmp, relaswap);
+
 			relocs += count_relocs((void *)sechdrs[i].sh_addr,
 					       sechdrs[i].sh_size
 					       / sizeof(Elf64_Rela));
@@ -343,7 +392,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 			/* Simply set it */
 			*(u32 *)location = value;
 			break;
-			
+
 		case R_PPC64_ADDR64:
 			/* Simply set it */
 			*(unsigned long *)location = value;
@@ -399,7 +448,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 			}
 
 			/* Only replace bits 2 through 26 */
-			*(uint32_t *)location 
+			*(uint32_t *)location
 				= (*(uint32_t *)location & ~0x03fffffc)
 				| (value & 0x03fffffc);
 			break;
-- 
1.5.3.GIT

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] [POWERPC] Optimize counting distinct entries in the relocation sections
  2007-11-08 23:36 [PATCH] [POWERPC] Optimize counting distinct entries in the relocation sections Emil Medve
@ 2007-11-12  6:00 ` Paul Mackerras
  2007-11-12  8:01   ` Rusty Russell
  2007-11-12 16:50   ` [PATCH] [POWERPC] Optimize counting distinct entries in the relocation sections Medve Emilian
  0 siblings, 2 replies; 8+ messages in thread
From: Paul Mackerras @ 2007-11-12  6:00 UTC (permalink / raw)
  To: Emil Medve; +Cc: sfr, rusty, linuxppc-dev, ntl, linuxppc-embedded

Emil Medve writes:

> (Not sure why the relocation tables could contain lots of duplicates and why
> they are not trimmed at compile time by the linker. In some test cases, out of
> 35K relocation entries only 1.5K were distinct/unique)

Presumably you have lots of calls to the same function, or lots of
references to the same variable.

Actually I notice that count_relocs is counting all relocs, not just
the R_PPC_REL24 ones, which are all that we actually care about in
sizing the PLT.  And I would be willing to bet that every single
R_PPC_REL24 reloc has r_addend == 0.

Also I notice that even with your patch, the actual process of doing
the relocations will take time proportional to the product of the
number of PLT entries times the number of R_PPC_REL24 relocations,
since we do a linear search through the PLT entries each time.

So, two approaches suggest themselves.  Both optimize the r_addend=0
case and fall back to something like the current code if r_addend is
not zero.  The first is to use the st_other field in the symbol to
record whether we have seen a R_PPC_REL24 reloc referring to the
symbol with r_addend=0.  That would make count_relocs of complexity
O(N) for N relocs.

The second is to allocate an array with 1 pointer per symbol that
points to the PLT entry (if any) for the symbol.  The count_relocs
scan can then use that array to store a 'seen before' flag to make its
scan O(N), and do_plt_call can then later use the same array to find
PLT entries without needing the linear scan.

As far as your proposed patch is concerned, I don't like having a
function called "count_relocs" changing the array of relocations.  At
the very least it needs a different name.  But I also think we can do
better than O(N * log N), as I have explained above, if my assertion
that r_addend=0 in all the cases we care about is correct.

Paul.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] [POWERPC] Optimize counting distinct entries in the relocation sections
  2007-11-12  6:00 ` Paul Mackerras
@ 2007-11-12  8:01   ` Rusty Russell
  2007-11-12 11:55     ` Modulo operation in C for -ve values Deepak Gaur
  2007-11-12 16:50   ` [PATCH] [POWERPC] Optimize counting distinct entries in the relocation sections Medve Emilian
  1 sibling, 1 reply; 8+ messages in thread
From: Rusty Russell @ 2007-11-12  8:01 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: sfr, linuxppc-dev, ntl, linuxppc-embedded

On Monday 12 November 2007 17:00:43 Paul Mackerras wrote:
> Emil Medve writes:
> > (Not sure why the relocation tables could contain lots of duplicates and
> > why they are not trimmed at compile time by the linker. In some test
> > cases, out of 35K relocation entries only 1.5K were distinct/unique)
>
> Presumably you have lots of calls to the same function, or lots of
> references to the same variable.

Yes, and objdump -r is your friend here.  It might be worth checking that 
we're counting relocs in a section we actually care about (ie. it's 
SHF_ALLOC).

It'd be a shame to optimize this only to find it could be fixed with a 
one-liner.

> The first is to use the st_other field in the symbol to
> record whether we have seen a R_PPC_REL24 reloc referring to the
> symbol with r_addend=0.  That would make count_relocs of complexity
> O(N) for N relocs.

Just note when you implement this: there may be two PLT entries per symbol: 
one for init code and one for core code.  But that's just two bits.

If Paul is right about r_addend=0 being common, you can simply count unique 
occurences of that, and add all the r_addend != 0 cases as if they were 
unique (note: it's fine to over-estimate).

> As far as your proposed patch is concerned, I don't like having a
> function called "count_relocs" changing the array of relocations.  At
> the very least it needs a different name.

Yes, and it should also return "u32" not "uint32_t" if you're going to change 
the return type.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Modulo operation in C for -ve values
  2007-11-12  8:01   ` Rusty Russell
@ 2007-11-12 11:55     ` Deepak Gaur
  0 siblings, 0 replies; 8+ messages in thread
From: Deepak Gaur @ 2007-11-12 11:55 UTC (permalink / raw)
  To: linuxppc-embedded

The Modulo operation as specified in
http://xenia.media.mit.edu/~bdenckla/thesis/texts/htthe/node13.html says that
for a fraction like n/k which can be expressed as n/k = i + j/k the C division
and mod operation should yeild
n div k = i (integer part)
n mod k = j (remainder part)
For n +ve above is true
For n -ve
-n/k = -i + j/k
-n div k = -i
-n mod k = j (+ve remainder)

But running a sample program on Redhat enterprise Linux EL4
with libc-2.3.4 gcc version 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main()
{
int n,k,j;
n=-3;
k=8;
j=(n/k);
printf("\n n div k %d", j);
j=(n%k);
printf("\n n mod k %d", j);
}
gives following output for n = -3 k = 8
n div k 0
n mod k  -3
though it should have been as per hypothesis proposed in
http://xenia.media.mit.edu/~bdenckla/thesis/texts/htthe/node13.html
n div k -1
n mod k  5

Which is correct(0,-3) or (-1,5)? 

Thanks
Deepak Gaur

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] [POWERPC] Optimize counting distinct entries in the relocation sections
  2007-11-12  6:00 ` Paul Mackerras
  2007-11-12  8:01   ` Rusty Russell
@ 2007-11-12 16:50   ` Medve Emilian
  2007-11-12 22:31     ` [PATCH] [POWERPC] Optimize counting distinct entries in therelocation sections Medve Emilian
  1 sibling, 1 reply; 8+ messages in thread
From: Medve Emilian @ 2007-11-12 16:50 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: sfr, rusty, linuxppc-dev, ntl, linuxppc-embedded

Hello Paul,


> Actually I notice that count_relocs is counting all relocs, not just
> the R_PPC_REL24 ones, which are all that we actually care about in
> sizing the PLT.  And I would be willing to bet that every single
> R_PPC_REL24 reloc has r_addend =3D=3D 0.

I'll count only the R_PPC_REL24 and I'll validate if they have r_addend
=3D=3D 0.

> Also I notice that even with your patch, the actual process of doing
> the relocations will take time proportional to the product of the
> number of PLT entries times the number of R_PPC_REL24 relocations,
> since we do a linear search through the PLT entries each time.

The reason I started working on this patch was because the kernel
detected a soft lockup in count_relocs(). It didn't complain about other
parts so I did nothing about them.

> So, two approaches suggest themselves.  Both optimize the r_addend=3D0
> case and fall back to something like the current code if r_addend is
> not zero.  The first is to use the st_other field in the symbol to
> record whether we have seen a R_PPC_REL24 reloc referring to the
> symbol with r_addend=3D0.  That would make count_relocs of complexity
> O(N) for N relocs.

Will look into it.

> The second is to allocate an array with 1 pointer per symbol that
> points to the PLT entry (if any) for the symbol.  The count_relocs
> scan can then use that array to store a 'seen before' flag to make its
> scan O(N), and do_plt_call can then later use the same array to find
> PLT entries without needing the linear scan.

This uses extra memory (which could be 'significant' for small boards)
and I was trying to avoid that.


> As far as your proposed patch is concerned, I don't like having a
> function called "count_relocs" changing the array of relocations.  At
> the very least it needs a different name.  But I also think we can do
> better than O(N * log N), as I have explained above, if my assertion
> that r_addend=3D0 in all the cases we care about is correct.

The array of relocations is not changed in count_relocs() but in
get_plt_size().


Thanks for your time and patience,
Emil.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] [POWERPC] Optimize counting distinct entries in therelocation sections
  2007-11-12 16:50   ` [PATCH] [POWERPC] Optimize counting distinct entries in the relocation sections Medve Emilian
@ 2007-11-12 22:31     ` Medve Emilian
  2007-11-13  2:49       ` Paul Mackerras
  0 siblings, 1 reply; 8+ messages in thread
From: Medve Emilian @ 2007-11-12 22:31 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: sfr, rusty, linuxppc-embedded, linuxppc-dev

Hello Paul,

> > Actually I notice that count_relocs is counting all relocs, not just
> > the R_PPC_REL24 ones, which are all that we actually care about in
> > sizing the PLT.  And I would be willing to bet that every single
> > R_PPC_REL24 reloc has r_addend =3D=3D 0.
>=20
> I'll count only the R_PPC_REL24 and I'll validate if they=20
> have r_addend
> =3D=3D 0.

Seems like there are R_PPC_REL24 with r_addend !=3D 0. Within a set of =
41
modules (featuring 5457 R_PPC_REL24 relocations) already included within
the kernel tree I found 37 such relocations (R_PPC_REL24) with r_addend
!=3D 0. In my test case, from 35K relocations, 7K are R_PPC_REL24 and =
from
those only 8 have r_addend !=3D 0.

> > Also I notice that even with your patch, the actual process of doing
> > the relocations will take time proportional to the product of the
> > number of PLT entries times the number of R_PPC_REL24 relocations,
> > since we do a linear search through the PLT entries each time.
>=20
> The reason I started working on this patch was because the kernel
> detected a soft lockup in count_relocs(). It didn't complain=20
> about other
> parts so I did nothing about them.

Since the time hog in my case (and other's cases, reflected by the
previous patches) seems to be count_relocs(), would it be acceptable as
an incremental improvement to fix this now and address the relocation
process performance (later) if needed?

I'll resubmit the patch incorporating some of the feedback from you and
Rusty.

Cheers,
Emil.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] [POWERPC] Optimize counting distinct entries in therelocation sections
  2007-11-12 22:31     ` [PATCH] [POWERPC] Optimize counting distinct entries in therelocation sections Medve Emilian
@ 2007-11-13  2:49       ` Paul Mackerras
  2007-11-13  4:27         ` Rusty Russell
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Mackerras @ 2007-11-13  2:49 UTC (permalink / raw)
  To: Medve Emilian; +Cc: sfr, rusty, linuxppc-embedded, linuxppc-dev

Medve Emilian writes:

> Seems like there are R_PPC_REL24 with r_addend != 0. Within a set of 41
> modules (featuring 5457 R_PPC_REL24 relocations) already included within
> the kernel tree I found 37 such relocations (R_PPC_REL24) with r_addend
> != 0. In my test case, from 35K relocations, 7K are R_PPC_REL24 and from
> those only 8 have r_addend != 0.

I did a quick scan and the ones with r_addend != 0 all seem to be
references to .text from the .init.text, .exit.text or .fixup
sections.  Assuming we can get those allocated near each other they
shouldn't need trampolines.

Rusty, do we manage to put .init.text and .fixup near .text?

Also, do you know what we see in r_info for a relocation that is
relative to a section rather than a symbol?

Paul.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] [POWERPC] Optimize counting distinct entries in therelocation sections
  2007-11-13  2:49       ` Paul Mackerras
@ 2007-11-13  4:27         ` Rusty Russell
  0 siblings, 0 replies; 8+ messages in thread
From: Rusty Russell @ 2007-11-13  4:27 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: sfr, linuxppc-embedded, linuxppc-dev

On Tuesday 13 November 2007 13:49:15 Paul Mackerras wrote:
> Medve Emilian writes:
> > Seems like there are R_PPC_REL24 with r_addend != 0. Within a set of 41
> > modules (featuring 5457 R_PPC_REL24 relocations) already included within
> > the kernel tree I found 37 such relocations (R_PPC_REL24) with r_addend
> > != 0. In my test case, from 35K relocations, 7K are R_PPC_REL24 and from
> > those only 8 have r_addend != 0.
>
> I did a quick scan and the ones with r_addend != 0 all seem to be
> references to .text from the .init.text, .exit.text or .fixup
> sections.  Assuming we can get those allocated near each other they
> shouldn't need trampolines.
>
> Rusty, do we manage to put .init.text and .fixup near .text?

Definitely not.  That's why we trampoline between them.  But since we discard 
the init sections and the tramps with them, I wouldn't bother uniquifing 
them: just alloc that many.

> Also, do you know what we see in r_info for a relocation that is
> relative to a section rather than a symbol?

Can't remember off the top of my head, sorry.

Rusty.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-11-13  4:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-08 23:36 [PATCH] [POWERPC] Optimize counting distinct entries in the relocation sections Emil Medve
2007-11-12  6:00 ` Paul Mackerras
2007-11-12  8:01   ` Rusty Russell
2007-11-12 11:55     ` Modulo operation in C for -ve values Deepak Gaur
2007-11-12 16:50   ` [PATCH] [POWERPC] Optimize counting distinct entries in the relocation sections Medve Emilian
2007-11-12 22:31     ` [PATCH] [POWERPC] Optimize counting distinct entries in therelocation sections Medve Emilian
2007-11-13  2:49       ` Paul Mackerras
2007-11-13  4:27         ` Rusty Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).