* [PATCH] module: Fix performance regression on modules with large symbol tables
@ 2011-11-05 0:48 Kevin Cernekee
2011-11-07 1:29 ` Rusty Russell
0 siblings, 1 reply; 6+ messages in thread
From: Kevin Cernekee @ 2011-11-05 0:48 UTC (permalink / raw)
To: Rusty Russell; +Cc: linux-kernel, linux-kbuild, Jan Beulich
Commit 554bdfe5acf3715e87c8d5e25a4f9a896ac9f014 (module: reduce string
table for loaded modules) introduced an optimization to shrink the size of
the resident string table. Part of this involves calling bitmap_weight()
on the strmap bitmap once for each core symbol. strmap contains one bit
for each byte of the module's strtab.
For kernel modules with a large number of symbols, the addition of the
bitmap_weight() operation to each iteration of the add_kallsyms() loop
resulted in a significant "insmod" performance regression from 2.6.31
to 2.6.32. bitmap_weight() is expensive when the bitmap is large.
The proposed alternative optimizes the common case in this loop
(is_core_symbol() == true, and the symbol name is not a duplicate), while
penalizing the exceptional case of a duplicate symbol.
My test was run on a 600 MHz MIPS processor, using a kernel module with
15,000 "core" symbols and 10,000 symbols in .init.text. .strtab takes up
250,227 bytes.
Original code: insmod takes 3.39 seconds
Patched code: insmod takes 0.07 seconds
Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
---
Since the new code performs an exhaustive string compare search when it
encounters duplicate symbols inside a module (i.e. multiple symtab entries
referring to the same strtab index), I did some extra checking on my
Linux PC to see how common this is:
For modules other than nvidia, there were 35 duplicate symbols out of
9,956 total LKM symbols (0.4%). This is with KALLSYMS and KALLSYMS_ALL
enabled. Many were ".LCx" literal constants, and others were random
duplications of trace_kmalloc(), cache_put(), do_vfs_lock(), etc.
Probably caused by combining multiple *.o files into a single *.ko file.
The nvidia module has 29,296 total entries, and 3,045 duplicates (10%).
There were 597 instances of each of: _nv009058rm, _nv009059rm,
_nv009060rm, and _nv009061rm.
To make sure the degenerate case of nvidia.ko was still covered, I ran
additional tests with qemu-system-arm (ARM Versatile) on Linus' head of
tree:
Latest kernel (commit 15831714), 25,000 symbol test (as above): 4.5s
Latest kernel with 2,400 (16%) of my module's core symbols turned into
duplicates through hex editing: 4.4s
Patched kernel, 25,000 symbol test: 0.1s
Patched kernel, with 2,400 duplicate symbols: 0.8s
So, even a module with large numbers of duplicate symbols loads more
quickly with my patch, than without it.
kernel/module.c | 26 ++++++++++++++++++--------
1 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/kernel/module.c b/kernel/module.c
index 93342d9..7f5dcbf 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2221,7 +2221,7 @@ static void layout_symtab(struct module *mod, struct load_info *info)
static void add_kallsyms(struct module *mod, const struct load_info *info)
{
- unsigned int i, ndst;
+ unsigned int i, j, stridx = 1, ndst;
const Elf_Sym *src;
Elf_Sym *dst;
char *s;
@@ -2237,22 +2237,32 @@ static void add_kallsyms(struct module *mod, const struct load_info *info)
mod->symtab[i].st_info = elf_type(&mod->symtab[i], info);
mod->core_symtab = dst = mod->module_core + info->symoffs;
+ mod->core_strtab = s = mod->module_core + info->stroffs;
src = mod->symtab;
*dst = *src;
+ *s++ = 0;
for (ndst = i = 1; i < mod->num_symtab; ++i, ++src) {
if (!is_core_symbol(src, info->sechdrs, info->hdr->e_shnum))
continue;
dst[ndst] = *src;
- dst[ndst].st_name = bitmap_weight(info->strmap,
- dst[ndst].st_name);
+ if (unlikely(!test_bit(src->st_name, info->strmap))) {
+ dst[ndst].st_name = 0;
+ for (j = 1; j < ndst; j++)
+ if (!strcmp(&mod->strtab[src->st_name],
+ &mod->core_strtab[dst[j].st_name]))
+ dst[ndst].st_name = dst[j].st_name;
+ } else {
+ dst[ndst].st_name = stridx;
+ j = src->st_name;
+ clear_bit(j, info->strmap);
+ do {
+ *s = mod->strtab[j++];
+ stridx++;
+ } while (*s++);
+ }
++ndst;
}
mod->core_num_syms = ndst;
-
- mod->core_strtab = s = mod->module_core + info->stroffs;
- for (*s = 0, i = 1; i < info->sechdrs[info->index.str].sh_size; ++i)
- if (test_bit(i, info->strmap))
- *++s = mod->strtab[i];
}
#else
static inline void layout_symtab(struct module *mod, struct load_info *info)
--
1.7.6.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] module: Fix performance regression on modules with large symbol tables
2011-11-05 0:48 [PATCH] module: Fix performance regression on modules with large symbol tables Kevin Cernekee
@ 2011-11-07 1:29 ` Rusty Russell
2011-11-07 19:58 ` Kevin Cernekee
0 siblings, 1 reply; 6+ messages in thread
From: Rusty Russell @ 2011-11-07 1:29 UTC (permalink / raw)
To: Kevin Cernekee; +Cc: linux-kernel, linux-kbuild, Jan Beulich
On Fri, 04 Nov 2011 17:48:58 -0700, Kevin Cernekee <cernekee@gmail.com> wrote:
> Commit 554bdfe5acf3715e87c8d5e25a4f9a896ac9f014 (module: reduce string
> table for loaded modules) introduced an optimization to shrink the size of
> the resident string table. Part of this involves calling bitmap_weight()
> on the strmap bitmap once for each core symbol. strmap contains one bit
> for each byte of the module's strtab.
>
> For kernel modules with a large number of symbols, the addition of the
> bitmap_weight() operation to each iteration of the add_kallsyms() loop
> resulted in a significant "insmod" performance regression from 2.6.31
> to 2.6.32. bitmap_weight() is expensive when the bitmap is large.
>
> The proposed alternative optimizes the common case in this loop
> (is_core_symbol() == true, and the symbol name is not a duplicate), while
> penalizing the exceptional case of a duplicate symbol.
I see you honored Jan's original patch by leaving it mysterious and
undocumented. I was totally confused by your patch.
I finally figured out that you pack the compacted strtab in symtab
order, and override the strmap bit to detect duplicates. Clever.
But I don't think it quite works in general. gcc is allowed to compact
the strtab itself, eg with two names "foobar" and "bar", it can reuse
the tail of the first strtab entry. It may not yet, but it could.
This makes your job a bit harder, but not too bad. See below...
I've started a separate patch to add comments to all this.
Cheers,
Rusty.
module: fix overlapping entries in symtab case.
The prior patch didn't handle the (theoretical?) case where strings in
the strtab are partially shared by two symtab entries. This detects
that correctly, and does an exhaustive search for that case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
diff --git a/kernel/module.c b/kernel/module.c
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2221,7 +2221,7 @@ static void layout_symtab(struct module
static void add_kallsyms(struct module *mod, const struct load_info *info)
{
- unsigned int i, j, stridx = 1, ndst;
+ unsigned int i, ndst;
const Elf_Sym *src;
Elf_Sym *dst;
char *s;
@@ -2242,23 +2242,26 @@ static void add_kallsyms(struct module *
*dst = *src;
*s++ = 0;
for (ndst = i = 1; i < mod->num_symtab; ++i, ++src) {
+ const char *name;
if (!is_core_symbol(src, info->sechdrs, info->hdr->e_shnum))
continue;
dst[ndst] = *src;
+
+ name = &mod->strtab[src->st_name];
if (unlikely(!test_bit(src->st_name, info->strmap))) {
- dst[ndst].st_name = 0;
- for (j = 1; j < ndst; j++)
- if (!strcmp(&mod->strtab[src->st_name],
- &mod->core_strtab[dst[j].st_name]))
- dst[ndst].st_name = dst[j].st_name;
+ char *dup;
+
+ for (dup = mod->core_strtab; strcmp(dup, name); dup++)
+ BUG_ON(dup > s);
+
+ dst[ndst].st_name = dup - mod->core_strtab;
} else {
- dst[ndst].st_name = stridx;
- j = src->st_name;
- clear_bit(j, info->strmap);
- do {
- *s = mod->strtab[j++];
- stridx++;
- } while (*s++);
+ unsigned len = strlen(name) + 1;
+
+ dst[ndst].st_name = s - mod->core_strtab;
+ memcpy(s, name, len);
+ s += len;
+ bitmap_clear(info->strmap, src->st_name, len);
}
++ndst;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] module: Fix performance regression on modules with large symbol tables
2011-11-07 1:29 ` Rusty Russell
@ 2011-11-07 19:58 ` Kevin Cernekee
2011-11-07 23:27 ` Rusty Russell
2011-11-08 7:54 ` Jan Beulich
0 siblings, 2 replies; 6+ messages in thread
From: Kevin Cernekee @ 2011-11-07 19:58 UTC (permalink / raw)
To: Rusty Russell; +Cc: linux-kernel, linux-kbuild, Jan Beulich
On Sun, Nov 6, 2011 at 5:29 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> I see you honored Jan's original patch by leaving it mysterious and
> undocumented. I was totally confused by your patch.
Sorry - was just trying to mimic the existing level of verbosity.
> But I don't think it quite works in general. gcc is allowed to
> compact the strtab itself, eg with two names "foobar" and "bar", it
> can reuse the tail of the first strtab entry. It may not yet, but it
> could.
Well, sure enough, libbfd's elf-strtab.c is advertised as:
/* ELF strtab with GC and suffix merging support. */
The function that performs the suffix merging is
_bfd_elf_strtab_finalize(). For some reason this only seems to get
called for .shstrtab (or .dynstr), not for the larger .strtab . And I'm
not sure why this would be useful for .shstrtab, since the section names
usually start with '.'. I guess it might save a few dozen bytes for
cases like .init.data or .exit.text .
But as you said, there is nothing keeping the binutils maintainers from
extending this to .strtab in the future.
One thing I noticed when making a test case was that the earliest entry
in the section table doesn't necessarily pick the longest substring:
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000006 0000000000000000 AX 0 0 4
[ 2] .data PROGBITS 0000000000000000 00000048
0000000000000000 0000000000000000 WA 0 0 4
[ 3] .bss NOBITS 0000000000000000 00000048
0000000000000000 0000000000000000 WA 0 0 4
[ 4] a_ PROGBITS 0000000000000000 00000048
0000000000000006 0000000000000000 AX 0 0 1
[ 5] fedcba_ PROGBITS 0000000000000000 0000004e
0000000000000006 0000000000000000 AX 0 0 1
[ 6] cba_ PROGBITS 0000000000000000 00000054
0000000000000006 0000000000000000 AX 0 0 1
Hex dump of section '.shstrtab':
0x00000000 002e7379 6d746162 002e7374 72746162 ..symtab..strtab
0x00000010 002e7368 73747274 6162002e 74657874 ..shstrtab..text
0x00000020 002e6461 7461002e 62737300 67666564 ..data..bss.gfed
0x00000030 6362615f 002e636f 6d6d656e 74002e6e cba_..comment..n
0x00000040 6f74652e 474e552d 73746163 6b002e72 ote.GNU-stack..r
0x00000050 656c612e 65685f66 72616d65 00 ela.eh_frame.
So, the first entry referencing "gfedcba_" has sh_name 0x32 ("a_").
The second entry referencing "gfedcba_" has sh_name 0x2d ("fedcba_").
The third entry referencing "gfedcba_" has sh_name 0x30 ("cba_").
If the same holds true for .symtab, this means we probably need to
change the logic to "backtrack" to the beginning of the string before
copying it into core_strtab, because a subsequent symtab entry could
reference a larger chunk of the string.
We will also want to check the strmap bits, since it's possible that
"cba_" was a core symbol but "edcba_" is not. In which case we did not
allocate room for the "ed" part, back in layout_symtab().
> I've started a separate patch to add comments to all this.
Not sure how you want to handle combining these changes?
I'll just send a V2 that rolls in all of the desired modifications, if
there are no objections...
Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] module: Fix performance regression on modules with large symbol tables
2011-11-07 19:58 ` Kevin Cernekee
@ 2011-11-07 23:27 ` Rusty Russell
2011-11-08 7:54 ` Jan Beulich
1 sibling, 0 replies; 6+ messages in thread
From: Rusty Russell @ 2011-11-07 23:27 UTC (permalink / raw)
To: Kevin Cernekee; +Cc: linux-kernel, linux-kbuild, Jan Beulich
On Mon, 7 Nov 2011 11:58:08 -0800, Kevin Cernekee <cernekee@gmail.com> wrote:
> > I've started a separate patch to add comments to all this.
>
> Not sure how you want to handle combining these changes?
>
> I'll just send a V2 that rolls in all of the desired modifications, if
> there are no objections...
Please do. I'll write a comment patch on top of that, too.
Thanks!
Rusty.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] module: Fix performance regression on modules with large symbol tables
2011-11-07 19:58 ` Kevin Cernekee
2011-11-07 23:27 ` Rusty Russell
@ 2011-11-08 7:54 ` Jan Beulich
2011-11-08 21:30 ` Rusty Russell
1 sibling, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2011-11-08 7:54 UTC (permalink / raw)
To: Kevin Cernekee; +Cc: Rusty Russell, linux-kbuild, linux-kernel
>>> On 07.11.11 at 20:58, Kevin Cernekee <cernekee@gmail.com> wrote:
> We will also want to check the strmap bits, since it's possible that
> "cba_" was a core symbol but "edcba_" is not. In which case we did not
> allocate room for the "ed" part, back in layout_symtab().
Why would you want to keep the leading part if it's not part of a
core symbol's name?
Jan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] module: Fix performance regression on modules with large symbol tables
2011-11-08 7:54 ` Jan Beulich
@ 2011-11-08 21:30 ` Rusty Russell
0 siblings, 0 replies; 6+ messages in thread
From: Rusty Russell @ 2011-11-08 21:30 UTC (permalink / raw)
To: Jan Beulich, Kevin Cernekee; +Cc: linux-kbuild, linux-kernel
On Tue, 08 Nov 2011 00:54:09 -0700, "Jan Beulich" <JBeulich@suse.com> wrote:
> >>> On 07.11.11 at 20:58, Kevin Cernekee <cernekee@gmail.com> wrote:
> > We will also want to check the strmap bits, since it's possible that
> > "cba_" was a core symbol but "edcba_" is not. In which case we did not
> > allocate room for the "ed" part, back in layout_symtab().
>
> Why would you want to keep the leading part if it's not part of a
> core symbol's name?
Indeed, sorry, I didn't read this part.
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-11-08 23:42 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-05 0:48 [PATCH] module: Fix performance regression on modules with large symbol tables Kevin Cernekee
2011-11-07 1:29 ` Rusty Russell
2011-11-07 19:58 ` Kevin Cernekee
2011-11-07 23:27 ` Rusty Russell
2011-11-08 7:54 ` Jan Beulich
2011-11-08 21:30 ` Rusty Russell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox