From: Kevin Cernekee <cernekee@gmail.com>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-kernel@vger.kernel.org, linux-kbuild@vger.kernel.org,
Jan Beulich <JBeulich@novell.com>
Subject: [PATCH] module: Fix performance regression on modules with large symbol tables
Date: Fri, 04 Nov 2011 17:48:58 -0700 [thread overview]
Message-ID: <467be983bed865e1b772136ed488f9ff@localhost> (raw)
Commit 554bdfe5acf3715e87c8d5e25a4f9a896ac9f014 (module: reduce string
table for loaded modules) introduced an optimization to shrink the size of
the resident string table. Part of this involves calling bitmap_weight()
on the strmap bitmap once for each core symbol. strmap contains one bit
for each byte of the module's strtab.
For kernel modules with a large number of symbols, the addition of the
bitmap_weight() operation to each iteration of the add_kallsyms() loop
resulted in a significant "insmod" performance regression from 2.6.31
to 2.6.32. bitmap_weight() is expensive when the bitmap is large.
The proposed alternative optimizes the common case in this loop
(is_core_symbol() == true, and the symbol name is not a duplicate), while
penalizing the exceptional case of a duplicate symbol.
My test was run on a 600 MHz MIPS processor, using a kernel module with
15,000 "core" symbols and 10,000 symbols in .init.text. .strtab takes up
250,227 bytes.
Original code: insmod takes 3.39 seconds
Patched code: insmod takes 0.07 seconds
Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
---
Since the new code performs an exhaustive string compare search when it
encounters duplicate symbols inside a module (i.e. multiple symtab entries
referring to the same strtab index), I did some extra checking on my
Linux PC to see how common this is:
For modules other than nvidia, there were 35 duplicate symbols out of
9,956 total LKM symbols (0.4%). This is with KALLSYMS and KALLSYMS_ALL
enabled. Many were ".LCx" literal constants, and others were random
duplications of trace_kmalloc(), cache_put(), do_vfs_lock(), etc.
Probably caused by combining multiple *.o files into a single *.ko file.
The nvidia module has 29,296 total entries, and 3,045 duplicates (10%).
There were 597 instances of each of: _nv009058rm, _nv009059rm,
_nv009060rm, and _nv009061rm.
To make sure the degenerate case of nvidia.ko was still covered, I ran
additional tests with qemu-system-arm (ARM Versatile) on Linus' head of
tree:
Latest kernel (commit 15831714), 25,000 symbol test (as above): 4.5s
Latest kernel with 2,400 (16%) of my module's core symbols turned into
duplicates through hex editing: 4.4s
Patched kernel, 25,000 symbol test: 0.1s
Patched kernel, with 2,400 duplicate symbols: 0.8s
So, even a module with large numbers of duplicate symbols loads more
quickly with my patch, than without it.
kernel/module.c | 26 ++++++++++++++++++--------
1 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/kernel/module.c b/kernel/module.c
index 93342d9..7f5dcbf 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2221,7 +2221,7 @@ static void layout_symtab(struct module *mod, struct load_info *info)
static void add_kallsyms(struct module *mod, const struct load_info *info)
{
- unsigned int i, ndst;
+ unsigned int i, j, stridx = 1, ndst;
const Elf_Sym *src;
Elf_Sym *dst;
char *s;
@@ -2237,22 +2237,32 @@ static void add_kallsyms(struct module *mod, const struct load_info *info)
mod->symtab[i].st_info = elf_type(&mod->symtab[i], info);
mod->core_symtab = dst = mod->module_core + info->symoffs;
+ mod->core_strtab = s = mod->module_core + info->stroffs;
src = mod->symtab;
*dst = *src;
+ *s++ = 0;
for (ndst = i = 1; i < mod->num_symtab; ++i, ++src) {
if (!is_core_symbol(src, info->sechdrs, info->hdr->e_shnum))
continue;
dst[ndst] = *src;
- dst[ndst].st_name = bitmap_weight(info->strmap,
- dst[ndst].st_name);
+ if (unlikely(!test_bit(src->st_name, info->strmap))) {
+ dst[ndst].st_name = 0;
+ for (j = 1; j < ndst; j++)
+ if (!strcmp(&mod->strtab[src->st_name],
+ &mod->core_strtab[dst[j].st_name]))
+ dst[ndst].st_name = dst[j].st_name;
+ } else {
+ dst[ndst].st_name = stridx;
+ j = src->st_name;
+ clear_bit(j, info->strmap);
+ do {
+ *s = mod->strtab[j++];
+ stridx++;
+ } while (*s++);
+ }
++ndst;
}
mod->core_num_syms = ndst;
-
- mod->core_strtab = s = mod->module_core + info->stroffs;
- for (*s = 0, i = 1; i < info->sechdrs[info->index.str].sh_size; ++i)
- if (test_bit(i, info->strmap))
- *++s = mod->strtab[i];
}
#else
static inline void layout_symtab(struct module *mod, struct load_info *info)
--
1.7.6.3
next reply other threads:[~2011-11-05 1:00 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-05 0:48 Kevin Cernekee [this message]
2011-11-07 1:29 ` [PATCH] module: Fix performance regression on modules with large symbol tables Rusty Russell
2011-11-07 19:58 ` Kevin Cernekee
2011-11-07 23:27 ` Rusty Russell
2011-11-08 7:54 ` Jan Beulich
2011-11-08 21:30 ` Rusty Russell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=467be983bed865e1b772136ed488f9ff@localhost \
--to=cernekee@gmail.com \
--cc=JBeulich@novell.com \
--cc=linux-kbuild@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=rusty@rustcorp.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox