[PATCH V2 1/2] module: Add comments describing how the "strmap" logic works

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH V2 1/2] module: Add comments describing how the "strmap" logic works
@ 2011-11-13  3:08 Kevin Cernekee
  2011-11-13  3:08 ` [PATCH V2 2/2] module: Fix performance regression on modules with large symbol tables Kevin Cernekee
  0 siblings, 1 reply; 10+ messages in thread
From: Kevin Cernekee @ 2011-11-13  3:08 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel, linux-kbuild, Jan Beulich

Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
---
 kernel/module.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 178333c..cf9f1b6 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2193,6 +2193,13 @@ static void layout_symtab(struct module *mod, struct load_info *info)
 
 	src = (void *)info->hdr + symsect->sh_offset;
 	nsrc = symsect->sh_size / sizeof(*src);
+
+	/*
+	 * info->strmap has a '1' bit for each byte of .strtab we want to
+	 * keep resident in mod->core_strtab.  Everything else in .strtab
+	 * is unreferenced by the symbols in mod->core_symtab, and will be
+	 * discarded when add_kallsyms() compacts the string table.
+	 */
 	for (ndst = i = 1; i < nsrc; ++i, ++src)
 		if (is_core_symbol(src, info->sechdrs, info->hdr->e_shnum)) {
 			unsigned int j = src->st_name;
@@ -2215,6 +2222,8 @@ static void layout_symtab(struct module *mod, struct load_info *info)
 
 	/* Append room for core symbols' strings at end of core part. */
 	info->stroffs = mod->core_size;
+
+	/* First strtab byte (and first symtab entry) are zeroes. */
 	__set_bit(0, info->strmap);
 	mod->core_size += bitmap_weight(info->strmap, strsect->sh_size);
 }
-- 
1.7.6.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V2 2/2] module: Fix performance regression on modules with large symbol tables
  2011-11-13  3:08 [PATCH V2 1/2] module: Add comments describing how the "strmap" logic works Kevin Cernekee
@ 2011-11-13  3:08 ` Kevin Cernekee
  2011-11-14  0:35   ` Rusty Russell
  2011-11-14  8:48   ` Jan Beulich
  0 siblings, 2 replies; 10+ messages in thread
From: Kevin Cernekee @ 2011-11-13  3:08 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel, linux-kbuild, Jan Beulich

Commit 554bdfe5acf3715e87c8d5e25a4f9a896ac9f014 (module: reduce string
table for loaded modules) introduced an optimization to shrink the size of
the resident string table.  Part of this involves calling bitmap_weight()
on the strmap bitmap once for each core symbol.  strmap contains one bit
for each byte of the module's strtab.

For kernel modules with a large number of symbols, the addition of the
bitmap_weight() operation to each iteration of the add_kallsyms() loop
resulted in a significant "insmod" performance regression from 2.6.31
to 2.6.32.  bitmap_weight() is expensive when the bitmap is large.

The proposed alternative optimizes the common case in this loop
(is_core_symbol() == true, and the symbol name is not a duplicate), while
penalizing the exceptional case of a duplicate symbol.

My test was run on a 600 MHz MIPS processor, using a kernel module with
15,000 "core" symbols and 10,000 symbols in .init.text.  .strtab takes up
250,227 bytes.

Original code: insmod takes 3.39 seconds
Patched code: insmod takes 0.07 seconds

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
---

V2:

Properly handle cases where different symtab entries point to strtab
substrings (suffix merging).  Mostly Rusty's work.  Also, add comments.

My test case for suffix merging was:

$ nm --no-sort alphabet.ko
00000000 t hello
00000014 t init
00000000 d retcode
00000000 r __mod_retcodetype8
00000000 r __param_retcode
00000000 r __param_str_retcode
00000000 r $LC0
00000018 r __module_depends
00000000 r ____versions
00000021 r __mod_vermagic5
00000050 T uvwxyz
00000000 D __this_module
00000048 T yz
00000000 T qrstuvwxyz
00000060 T rstuvwxyz
00000014 T init_module
00000040 T wxyz
00000068 T tuvwxyz
         U printk
00000058 T vwxyz
         U param_ops_int

"qrstuvwxyz" is in .init.text (non-core symbol); all others were core
symbols.  The symtab entries for all substrings were hexedited to
point to the respective portion of the "qrstuvwxyz" strtab entry.

I verified that "rstuvwxyz" (without the 'q') was copied to core_strtab
when the loop hit "uvwxyz", and that the substrings all used that entry.

 kernel/module.c |   40 +++++++++++++++++++++++++++++++++-------
 1 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index cf9f1b6..6c87305 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2246,22 +2246,48 @@ static void add_kallsyms(struct module *mod, const struct load_info *info)
 		mod->symtab[i].st_info = elf_type(&mod->symtab[i], info);

 	mod->core_symtab = dst = mod->module_core + info->symoffs;
+	mod->core_strtab = s = mod->module_core + info->stroffs;
 	src = mod->symtab;
 	*dst = *src;
+	*s++ = 0;
 	for (ndst = i = 1; i < mod->num_symtab; ++i, ++src) {
+		const char *name;
 		if (!is_core_symbol(src, info->sechdrs, info->hdr->e_shnum))
 			continue;
 		dst[ndst] = *src;
-		dst[ndst].st_name = bitmap_weight(info->strmap,
-						  dst[ndst].st_name);
+
+		name = &mod->strtab[src->st_name];
+		if (unlikely(!test_bit(src->st_name, info->strmap))) {
+			/* Symbol name has already been copied; find it. */
+			char *dup;
+
+			for (dup = mod->core_strtab; strcmp(dup, name); dup++)
+				BUG_ON(dup > s);
+
+			dst[ndst].st_name = dup - mod->core_strtab;
+		} else {
+			/*
+			 * Copy the symbol name to mod->core_strtab.
+			 * "name" might point to the middle of a longer strtab
+			 * entry, so backtrack to the first "required" byte
+			 * of the string.
+			 */
+			unsigned start = src->st_name, len = 0;
+
+			for (; test_bit(start - 1, info->strmap) &&
+			       mod->strtab[start - 1]; start--)
+				len++;
+
+			dst[ndst].st_name = len + s - mod->core_strtab;
+			len += strlen(name) + 1;
+
+			memcpy(s, &mod->strtab[start], len);
+			s += len;
+			bitmap_clear(info->strmap, start, len);
+		}
 		++ndst;
 	}
 	mod->core_num_syms = ndst;
-
-	mod->core_strtab = s = mod->module_core + info->stroffs;
-	for (*s = 0, i = 1; i < info->sechdrs[info->index.str].sh_size; ++i)
-		if (test_bit(i, info->strmap))
-			*++s = mod->strtab[i];
 }
 #else
 static inline void layout_symtab(struct module *mod, struct load_info *info)
-- 
1.7.6.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH V2 2/2] module: Fix performance regression on modules with large symbol tables
  2011-11-13  3:08 ` [PATCH V2 2/2] module: Fix performance regression on modules with large symbol tables Kevin Cernekee
@ 2011-11-14  0:35   ` Rusty Russell
  2011-11-14  8:48   ` Jan Beulich
  1 sibling, 0 replies; 10+ messages in thread
From: Rusty Russell @ 2011-11-14  0:35 UTC (permalink / raw)
  To: Kevin Cernekee; +Cc: linux-kernel, linux-kbuild, Jan Beulich

On Sat, 12 Nov 2011 19:08:56 -0800, Kevin Cernekee <cernekee@gmail.com> wrote:
> +			/*
> +			 * Copy the symbol name to mod->core_strtab.
> +			 * "name" might point to the middle of a longer strtab
> +			 * entry, so backtrack to the first "required" byte
> +			 * of the string.
> +			 */

Icky, but it works.

Both applied, thanks!
Rusty.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2 2/2] module: Fix performance regression on modules with large symbol tables
  2011-11-13  3:08 ` [PATCH V2 2/2] module: Fix performance regression on modules with large symbol tables Kevin Cernekee
  2011-11-14  0:35   ` Rusty Russell
@ 2011-11-14  8:48   ` Jan Beulich
  2011-11-16  0:24     ` Rusty Russell
  2011-11-18  3:15     ` Kevin Cernekee
  1 sibling, 2 replies; 10+ messages in thread
From: Jan Beulich @ 2011-11-14  8:48 UTC (permalink / raw)
  To: Kevin Cernekee, Rusty Russell; +Cc: linux-kbuild, linux-kernel

>>> On 13.11.11 at 04:08, Kevin Cernekee <cernekee@gmail.com> wrote:
> Commit 554bdfe5acf3715e87c8d5e25a4f9a896ac9f014 (module: reduce string
> table for loaded modules) introduced an optimization to shrink the size of
> the resident string table.  Part of this involves calling bitmap_weight()
> on the strmap bitmap once for each core symbol.  strmap contains one bit
> for each byte of the module's strtab.
> 
> For kernel modules with a large number of symbols, the addition of the
> bitmap_weight() operation to each iteration of the add_kallsyms() loop
> resulted in a significant "insmod" performance regression from 2.6.31
> to 2.6.32.  bitmap_weight() is expensive when the bitmap is large.
> 
> The proposed alternative optimizes the common case in this loop
> (is_core_symbol() == true, and the symbol name is not a duplicate), while
> penalizing the exceptional case of a duplicate symbol.
> 
> My test was run on a 600 MHz MIPS processor, using a kernel module with
> 15,000 "core" symbols and 10,000 symbols in .init.text.  .strtab takes up
> 250,227 bytes.
> 
> Original code: insmod takes 3.39 seconds
> Patched code: insmod takes 0.07 seconds
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
> ---
> 
> V2:
> 
> Properly handle cases where different symtab entries point to strtab
> substrings (suffix merging).  Mostly Rusty's work.  Also, add comments.
> 
> My test case for suffix merging was:
> 
> $ nm --no-sort alphabet.ko
> 00000000 t hello
> 00000014 t init
> 00000000 d retcode
> 00000000 r __mod_retcodetype8
> 00000000 r __param_retcode
> 00000000 r __param_str_retcode
> 00000000 r $LC0
> 00000018 r __module_depends
> 00000000 r ____versions
> 00000021 r __mod_vermagic5
> 00000050 T uvwxyz
> 00000000 D __this_module
> 00000048 T yz
> 00000000 T qrstuvwxyz
> 00000060 T rstuvwxyz
> 00000014 T init_module
> 00000040 T wxyz
> 00000068 T tuvwxyz
>          U printk
> 00000058 T vwxyz
>          U param_ops_int
> 
> "qrstuvwxyz" is in .init.text (non-core symbol); all others were core
> symbols.  The symtab entries for all substrings were hexedited to
> point to the respective portion of the "qrstuvwxyz" strtab entry.
> 
> I verified that "rstuvwxyz" (without the 'q') was copied to core_strtab
> when the loop hit "uvwxyz", and that the substrings all used that entry.
> 
> 
>  kernel/module.c |   40 +++++++++++++++++++++++++++++++++-------
>  1 files changed, 33 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index cf9f1b6..6c87305 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2246,22 +2246,48 @@ static void add_kallsyms(struct module *mod, const 
> struct load_info *info)
>  		mod->symtab[i].st_info = elf_type(&mod->symtab[i], info);
>  
>  	mod->core_symtab = dst = mod->module_core + info->symoffs;
> +	mod->core_strtab = s = mod->module_core + info->stroffs;
>  	src = mod->symtab;
>  	*dst = *src;
> +	*s++ = 0;
>  	for (ndst = i = 1; i < mod->num_symtab; ++i, ++src) {
> +		const char *name;
>  		if (!is_core_symbol(src, info->sechdrs, info->hdr->e_shnum))
>  			continue;
>  		dst[ndst] = *src;
> -		dst[ndst].st_name = bitmap_weight(info->strmap,
> -						  dst[ndst].st_name);
> +
> +		name = &mod->strtab[src->st_name];
> +		if (unlikely(!test_bit(src->st_name, info->strmap))) {
> +			/* Symbol name has already been copied; find it. */
> +			char *dup;
> +
> +			for (dup = mod->core_strtab; strcmp(dup, name); dup++)
> +				BUG_ON(dup > s);

Aren't you concerned that this again will be rather slow? It would be
pretty easy to accelerate by comparing only with the tail of each string
(as nothing else can possibly match), moving from string to string instead
of from character to character.

Jan

> +
> +			dst[ndst].st_name = dup - mod->core_strtab;
> +		} else {
> +			/*
> +			 * Copy the symbol name to mod->core_strtab.
> +			 * "name" might point to the middle of a longer strtab
> +			 * entry, so backtrack to the first "required" byte
> +			 * of the string.
> +			 */
> +			unsigned start = src->st_name, len = 0;
> +
> +			for (; test_bit(start - 1, info->strmap) &&
> +			       mod->strtab[start - 1]; start--)
> +				len++;
> +
> +			dst[ndst].st_name = len + s - mod->core_strtab;
> +			len += strlen(name) + 1;
> +
> +			memcpy(s, &mod->strtab[start], len);
> +			s += len;
> +			bitmap_clear(info->strmap, start, len);
> +		}
>  		++ndst;
>  	}
>  	mod->core_num_syms = ndst;
> -
> -	mod->core_strtab = s = mod->module_core + info->stroffs;
> -	for (*s = 0, i = 1; i < info->sechdrs[info->index.str].sh_size; ++i)
> -		if (test_bit(i, info->strmap))
> -			*++s = mod->strtab[i];
>  }
>  #else
>  static inline void layout_symtab(struct module *mod, struct load_info 
> *info)




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2 2/2] module: Fix performance regression on modules with large symbol tables
  2011-11-14  8:48   ` Jan Beulich
@ 2011-11-16  0:24     ` Rusty Russell
  2011-11-18  3:15     ` Kevin Cernekee
  1 sibling, 0 replies; 10+ messages in thread
From: Rusty Russell @ 2011-11-16  0:24 UTC (permalink / raw)
  To: Jan Beulich, Kevin Cernekee; +Cc: linux-kbuild, linux-kernel

On Mon, 14 Nov 2011 08:48:52 +0000, "Jan Beulich" <JBeulich@suse.com> wrote:
> > +		name = &mod->strtab[src->st_name];
> > +		if (unlikely(!test_bit(src->st_name, info->strmap))) {
> > +			/* Symbol name has already been copied; find it. */
> > +			char *dup;
> > +
> > +			for (dup = mod->core_strtab; strcmp(dup, name); dup++)
> > +				BUG_ON(dup > s);
> 
> Aren't you concerned that this again will be rather slow? It would be
> pretty easy to accelerate by comparing only with the tail of each string
> (as nothing else can possibly match), moving from string to string instead
> of from character to character.

Kevin's central thesis is that this is actually really unusual.

I'm not sure how much faster a tail search would be in practice.  We're
still scanning the string.  Perhaps a "dup[0] == name[0] &&" would
optimize it almost as well, if gcc doesn't already?

Kevin, you're probably in an optimal position to get numbers on this?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2 2/2] module: Fix performance regression on modules with large symbol tables
  2011-11-14  8:48   ` Jan Beulich
  2011-11-16  0:24     ` Rusty Russell
@ 2011-11-18  3:15     ` Kevin Cernekee
  2011-11-21  5:06       ` Rusty Russell
  1 sibling, 1 reply; 10+ messages in thread
From: Kevin Cernekee @ 2011-11-18  3:15 UTC (permalink / raw)
  To: 'Jan Beulich'; +Cc: 'Rusty Russell', linux-kbuild, linux-kernel

On Mon, 14 Nov 2011 08:48:52 +0000, "Jan Beulich" <JBeulich@suse.com> wrote:
> > +                   for (dup = mod->core_strtab; strcmp(dup, name); dup++)
> > +                           BUG_ON(dup > s);
>
> Aren't you concerned that this again will be rather slow? It would be
> pretty easy to accelerate by comparing only with the tail of each string
> (as nothing else can possibly match), moving from string to string instead
> of from character to character.

I reinstalled an old laptop today and then ran a few numbers for fun.

System: Linux 3.0.4, Ubuntu 11.10, x86 32-bit, Intel Core 2 P8700,
nvidia 280.13 driver

Looking at /proc/kallsyms, one starts to ponder whether all of the extra
strtab-related complexity in module.c is worth the memory savings:

nvidia.ko: 33,785 entries (3,037 are duplicates)
Other *.ko files: 5,972 entries (3 are duplicates)
Kernel image: 62,737 entries

Sum of lengths of all *.ko core symbol names + terminating bytes:
542,824 bytes; 408,585 of that is from nvidia.ko

Potential memory savings from suffix merging (shorter strings refer to a
longer strtab entry, exact matches excluded): 13,156 bytes total; 487
from nvidia.ko.  These mostly came from entries starting with __kcrctab*
or __param* .

Potential memory savings from reusing EXACT strtab matches: 36,475 bytes
total; 35,432 from nvidia.ko.  For comparison, discarding non-core
strtab entries saved about 91KB on my system.

Instead of making the add_kallsyms() loop even more complex, I tried the
other route of deleting the strmap logic and naively copying each string
into core_strtab with no consideration for consolidating duplicates.
According to the above figures, this would unnecessarily cost me about
49KB (10%) if binutils did suffix merging.

Performance on an "already exists" insmod of nvidia.ko (runs
add_kallsyms() but does not actually initialize the module):

Original scheme: 1.230s
With patch V2: 0.280s
With naive copying: 0.058s

I am posting the "naive copying" patch in case anybody else wants to
follow up on this.  I am satisfied enough with the way the "V2" code
works and do not know if any additional tweaking is warranted.  But if
you can make good use of my patch, feel free:

Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
---

diff --git a/kernel/module.c b/kernel/module.c
index 6c87305..b07f71c 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -138,7 +138,6 @@ struct load_info {
 	unsigned long len;
 	Elf_Shdr *sechdrs;
 	char *secstrings, *strtab;
-	unsigned long *strmap;
 	unsigned long symoffs, stroffs;
 	struct _ddebug *debug;
 	unsigned int num_debug;
@@ -2183,7 +2182,7 @@ static void layout_symtab(struct module *mod, struct load_info *info)
 	Elf_Shdr *symsect = info->sechdrs + info->index.sym;
 	Elf_Shdr *strsect = info->sechdrs + info->index.str;
 	const Elf_Sym *src;
-	unsigned int i, nsrc, ndst;
+	unsigned int i, nsrc, ndst, strtab_size;
 
 	/* Put symbol section at end of init part of module. */
 	symsect->sh_flags |= SHF_ALLOC;
@@ -2194,38 +2193,23 @@ static void layout_symtab(struct module *mod, struct load_info *info)
 	src = (void *)info->hdr + symsect->sh_offset;
 	nsrc = symsect->sh_size / sizeof(*src);
 
-	/*
-	 * info->strmap has a '1' bit for each byte of .strtab we want to
-	 * keep resident in mod->core_strtab.  Everything else in .strtab
-	 * is unreferenced by the symbols in mod->core_symtab, and will be
-	 * discarded when add_kallsyms() compacts the string table.
-	 */
-	for (ndst = i = 1; i < nsrc; ++i, ++src)
+	/* Compute total space required for the core symbols' strtab. */
+	for (ndst = i = strtab_size = 1; i < nsrc; ++i, ++src)
 		if (is_core_symbol(src, info->sechdrs, info->hdr->e_shnum)) {
-			unsigned int j = src->st_name;
-
-			while (!__test_and_set_bit(j, info->strmap)
-			       && info->strtab[j])
-				++j;
-			++ndst;
+			strtab_size += strlen(&info->strtab[src->st_name]) + 1;
+			ndst++;
 		}
 
 	/* Append room for core symbols at end of core part. */
 	info->symoffs = ALIGN(mod->core_size, symsect->sh_addralign ?: 1);
-	mod->core_size = info->symoffs + ndst * sizeof(Elf_Sym);
+	info->stroffs = mod->core_size = info->symoffs + ndst * sizeof(Elf_Sym);
+	mod->core_size += strtab_size;
 
 	/* Put string table section at end of init part of module. */
 	strsect->sh_flags |= SHF_ALLOC;
 	strsect->sh_entsize = get_offset(mod, &mod->init_size, strsect,
 					 info->index.str) | INIT_OFFSET_MASK;
 	DEBUGP("\t%s\n", info->secstrings + strsect->sh_name);
-
-	/* Append room for core symbols' strings at end of core part. */
-	info->stroffs = mod->core_size;
-
-	/* First strtab byte (and first symtab entry) are zeroes. */
-	__set_bit(0, info->strmap);
-	mod->core_size += bitmap_weight(info->strmap, strsect->sh_size);
 }
 
 static void add_kallsyms(struct module *mod, const struct load_info *info)
@@ -2251,41 +2235,12 @@ static void add_kallsyms(struct module *mod, const struct load_info *info)
 	*dst = *src;
 	*s++ = 0;
 	for (ndst = i = 1; i < mod->num_symtab; ++i, ++src) {
-		const char *name;
 		if (!is_core_symbol(src, info->sechdrs, info->hdr->e_shnum))
 			continue;
-		dst[ndst] = *src;
 
-		name = &mod->strtab[src->st_name];
-		if (unlikely(!test_bit(src->st_name, info->strmap))) {
-			/* Symbol name has already been copied; find it. */
-			char *dup;
-
-			for (dup = mod->core_strtab; strcmp(dup, name); dup++)
-				BUG_ON(dup > s);
-
-			dst[ndst].st_name = dup - mod->core_strtab;
-		} else {
-			/*
-			 * Copy the symbol name to mod->core_strtab.
-			 * "name" might point to the middle of a longer strtab
-			 * entry, so backtrack to the first "required" byte
-			 * of the string.
-			 */
-			unsigned start = src->st_name, len = 0;
-
-			for (; test_bit(start - 1, info->strmap) &&
-			       mod->strtab[start - 1]; start--)
-				len++;
-
-			dst[ndst].st_name = len + s - mod->core_strtab;
-			len += strlen(name) + 1;
-
-			memcpy(s, &mod->strtab[start], len);
-			s += len;
-			bitmap_clear(info->strmap, start, len);
-		}
-		++ndst;
+		dst[ndst] = *src;
+		dst[ndst++].st_name = s - mod->core_strtab;
+		s += strlcpy(s, &mod->strtab[src->st_name], KSYM_NAME_LEN) + 1;
 	}
 	mod->core_num_syms = ndst;
 }
@@ -2777,27 +2732,18 @@ static struct module *layout_and_allocate(struct load_info *info)
 	   this is done generically; there doesn't appear to be any
 	   special cases for the architectures. */
 	layout_sections(mod, info);
-
-	info->strmap = kzalloc(BITS_TO_LONGS(info->sechdrs[info->index.str].sh_size)
-			 * sizeof(long), GFP_KERNEL);
-	if (!info->strmap) {
-		err = -ENOMEM;
-		goto free_percpu;
-	}
 	layout_symtab(mod, info);
 
 	/* Allocate and move to the final place */
 	err = move_module(mod, info);
 	if (err)
-		goto free_strmap;
+		goto free_percpu;
 
 	/* Module has been copied to its final place now: return it. */
 	mod = (void *)info->sechdrs[info->index.mod].sh_addr;
 	kmemleak_load_module(mod, info);
 	return mod;
 
-free_strmap:
-	kfree(info->strmap);
 free_percpu:
 	percpu_modfree(mod);
 out:
@@ -2807,7 +2753,6 @@ out:
 /* mod is no longer valid after this! */
 static void module_deallocate(struct module *mod, struct load_info *info)
 {
-	kfree(info->strmap);
 	percpu_modfree(mod);
 	module_free(mod, mod->module_init);
 	module_free(mod, mod->module_core);
@@ -2937,8 +2882,7 @@ static struct module *load_module(void __user *umod,
 	if (err < 0)
 		goto unlink;
 
-	/* Get rid of temporary copy and strmap. */
-	kfree(info.strmap);
+	/* Get rid of temporary copy. */
 	free_copy(&info);
 
 	/* Done! */
--

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH V2 2/2] module: Fix performance regression on modules with large symbol tables
  2011-11-18  3:15     ` Kevin Cernekee
@ 2011-11-21  5:06       ` Rusty Russell
  2011-11-21  6:29         ` Kevin Cernekee
  2011-11-21  8:08         ` Jan Beulich
  0 siblings, 2 replies; 10+ messages in thread
From: Rusty Russell @ 2011-11-21  5:06 UTC (permalink / raw)
  To: Kevin Cernekee, 'Jan Beulich'; +Cc: linux-kbuild, linux-kernel

On Thu, 17 Nov 2011 19:15:02 -0800, Kevin Cernekee <cernekee@gmail.com> wrote:
> Potential memory savings from reusing EXACT strtab matches: 36,475 bytes
> total; 35,432 from nvidia.ko.  For comparison, discarding non-core
> strtab entries saved about 91KB on my system.
...
> Original scheme: 1.230s
> With patch V2: 0.280s
> With naive copying: 0.058s

I'm deeply tempted.  It's very simple, 46 lines shorter, preserves exact
matches, and doesn't have any strange slowdowns on corner cases.

But like Kevin, I could be convinced either way.  Jan?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2 2/2] module: Fix performance regression on modules with large symbol tables
  2011-11-21  5:06       ` Rusty Russell
@ 2011-11-21  6:29         ` Kevin Cernekee
  2011-11-21 23:57           ` Rusty Russell
  2011-11-21  8:08         ` Jan Beulich
  1 sibling, 1 reply; 10+ messages in thread
From: Kevin Cernekee @ 2011-11-21  6:29 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Jan Beulich, linux-kbuild, linux-kernel

On Sun, Nov 20, 2011 at 9:06 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> On Thu, 17 Nov 2011 19:15:02 -0800, Kevin Cernekee <cernekee@gmail.com> wrote:
> I'm deeply tempted.  It's very simple, 46 lines shorter, preserves exact
> matches, and doesn't have any strange slowdowns on corner cases.

Unfortunately, the last patch I posted still makes duplicate copies of
the exact matches.  And the copy of nvidia.ko I'm looking at right now
does reuse strtab entries for the duplicated symbols.

FWIW, preserving exact matches would save about 35KB on a kernel
module whose .text + .data + .rodata sections add up to ~9.5MB.

I'm sure there are more clever ways to do it, but a fast brute force
approach could involve storing the core_strtab offsets in a temporary
array indexed by the original strtab st_name values.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2 2/2] module: Fix performance regression on modules with large symbol tables
  2011-11-21  6:29         ` Kevin Cernekee
@ 2011-11-21 23:57           ` Rusty Russell
  0 siblings, 0 replies; 10+ messages in thread
From: Rusty Russell @ 2011-11-21 23:57 UTC (permalink / raw)
  To: Kevin Cernekee; +Cc: Jan Beulich, linux-kbuild, linux-kernel

On Sun, 20 Nov 2011 22:29:55 -0800, Kevin Cernekee <cernekee@gmail.com> wrote:
> On Sun, Nov 20, 2011 at 9:06 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> > On Thu, 17 Nov 2011 19:15:02 -0800, Kevin Cernekee <cernekee@gmail.com> wrote:
> > I'm deeply tempted.  It's very simple, 46 lines shorter, preserves exact
> > matches, and doesn't have any strange slowdowns on corner cases.
> 
> Unfortunately, the last patch I posted still makes duplicate copies of
> the exact matches.  And the copy of nvidia.ko I'm looking at right now
> does reuse strtab entries for the duplicated symbols.

Ah, I read it too fast.

Still, simplicity wins.  I like Jan's point, I'll put in a comment and
reference to this thread for future adventurers.  Result below.

Thanks!
Rusty.

From: Kevin Cernekee <cernekee@gmail.com>
Subject: module: Fix performance regression on modules with large symbol tables

Looking at /proc/kallsyms, one starts to ponder whether all of the extra
strtab-related complexity in module.c is worth the memory savings.

Instead of making the add_kallsyms() loop even more complex, I tried the
other route of deleting the strmap logic and naively copying each string
into core_strtab with no consideration for consolidating duplicates.

Performance on an "already exists" insmod of nvidia.ko (runs
add_kallsyms() but does not actually initialize the module):

	Original scheme: 1.230s
	With naive copying: 0.058s

Extra space used: 35k (of a 408k module).

Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <73defb5e4bca04a6431392cc341112b1@localhost>
---
 kernel/module.c |   65 ++++++++++++++++++--------------------------------------
 1 file changed, 21 insertions(+), 44 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -138,7 +138,6 @@ struct load_info {
 	unsigned long len;
 	Elf_Shdr *sechdrs;
 	char *secstrings, *strtab;
-	unsigned long *strmap;
 	unsigned long symoffs, stroffs;
 	struct _ddebug *debug;
 	unsigned int num_debug;
@@ -2178,12 +2177,19 @@ static bool is_core_symbol(const Elf_Sym
 	return true;
 }
 
+/*
+ * We only allocate and copy the strings needed by the parts of symtab
+ * we keep.  This is simple, but has the effect of making multiple
+ * copies of duplicates.  We could be more sophisticated, see
+ * linux-kernel thread starting with
+ * <73defb5e4bca04a6431392cc341112b1@localhost>.
+ */
 static void layout_symtab(struct module *mod, struct load_info *info)
 {
 	Elf_Shdr *symsect = info->sechdrs + info->index.sym;
 	Elf_Shdr *strsect = info->sechdrs + info->index.str;
 	const Elf_Sym *src;
-	unsigned int i, nsrc, ndst;
+	unsigned int i, nsrc, ndst, strtab_size;
 
 	/* Put symbol section at end of init part of module. */
 	symsect->sh_flags |= SHF_ALLOC;
@@ -2194,38 +2200,23 @@ static void layout_symtab(struct module 
 	src = (void *)info->hdr + symsect->sh_offset;
 	nsrc = symsect->sh_size / sizeof(*src);
 
-	/*
-	 * info->strmap has a '1' bit for each byte of .strtab we want to
-	 * keep resident in mod->core_strtab.  Everything else in .strtab
-	 * is unreferenced by the symbols in mod->core_symtab, and will be
-	 * discarded when add_kallsyms() compacts the string table.
-	 */
-	for (ndst = i = 1; i < nsrc; ++i, ++src)
+	/* Compute total space required for the core symbols' strtab. */
+	for (ndst = i = strtab_size = 1; i < nsrc; ++i, ++src)
 		if (is_core_symbol(src, info->sechdrs, info->hdr->e_shnum)) {
-			unsigned int j = src->st_name;
-
-			while (!__test_and_set_bit(j, info->strmap)
-			       && info->strtab[j])
-				++j;
-			++ndst;
+			strtab_size += strlen(&info->strtab[src->st_name]) + 1;
+			ndst++;
 		}
 
 	/* Append room for core symbols at end of core part. */
 	info->symoffs = ALIGN(mod->core_size, symsect->sh_addralign ?: 1);
-	mod->core_size = info->symoffs + ndst * sizeof(Elf_Sym);
+	info->stroffs = mod->core_size = info->symoffs + ndst * sizeof(Elf_Sym);
+	mod->core_size += strtab_size;
 
 	/* Put string table section at end of init part of module. */
 	strsect->sh_flags |= SHF_ALLOC;
 	strsect->sh_entsize = get_offset(mod, &mod->init_size, strsect,
 					 info->index.str) | INIT_OFFSET_MASK;
 	DEBUGP("\t%s\n", info->secstrings + strsect->sh_name);
-
-	/* Append room for core symbols' strings at end of core part. */
-	info->stroffs = mod->core_size;
-
-	/* First strtab byte (and first symtab entry) are zeroes. */
-	__set_bit(0, info->strmap);
-	mod->core_size += bitmap_weight(info->strmap, strsect->sh_size);
 }
 
 static void add_kallsyms(struct module *mod, const struct load_info *info)
@@ -2246,22 +2237,19 @@ static void add_kallsyms(struct module *
 		mod->symtab[i].st_info = elf_type(&mod->symtab[i], info);
 
 	mod->core_symtab = dst = mod->module_core + info->symoffs;
+	mod->core_strtab = s = mod->module_core + info->stroffs;
 	src = mod->symtab;
 	*dst = *src;
+	*s++ = 0;
 	for (ndst = i = 1; i < mod->num_symtab; ++i, ++src) {
 		if (!is_core_symbol(src, info->sechdrs, info->hdr->e_shnum))
 			continue;
+
 		dst[ndst] = *src;
-		dst[ndst].st_name = bitmap_weight(info->strmap,
-						  dst[ndst].st_name);
-		++ndst;
+		dst[ndst++].st_name = s - mod->core_strtab;
+		s += strlcpy(s, &mod->strtab[src->st_name], KSYM_NAME_LEN) + 1;
 	}
 	mod->core_num_syms = ndst;
-
-	mod->core_strtab = s = mod->module_core + info->stroffs;
-	for (*s = 0, i = 1; i < info->sechdrs[info->index.str].sh_size; ++i)
-		if (test_bit(i, info->strmap))
-			*++s = mod->strtab[i];
 }
 #else
 static inline void layout_symtab(struct module *mod, struct load_info *info)
@@ -2751,27 +2739,18 @@ static struct module *layout_and_allocat
 	   this is done generically; there doesn't appear to be any
 	   special cases for the architectures. */
 	layout_sections(mod, info);
-
-	info->strmap = kzalloc(BITS_TO_LONGS(info->sechdrs[info->index.str].sh_size)
-			 * sizeof(long), GFP_KERNEL);
-	if (!info->strmap) {
-		err = -ENOMEM;
-		goto free_percpu;
-	}
 	layout_symtab(mod, info);
 
 	/* Allocate and move to the final place */
 	err = move_module(mod, info);
 	if (err)
-		goto free_strmap;
+		goto free_percpu;
 
 	/* Module has been copied to its final place now: return it. */
 	mod = (void *)info->sechdrs[info->index.mod].sh_addr;
 	kmemleak_load_module(mod, info);
 	return mod;
 
-free_strmap:
-	kfree(info->strmap);
 free_percpu:
 	percpu_modfree(mod);
 out:
@@ -2781,7 +2760,6 @@ out:
 /* mod is no longer valid after this! */
 static void module_deallocate(struct module *mod, struct load_info *info)
 {
-	kfree(info->strmap);
 	percpu_modfree(mod);
 	module_free(mod, mod->module_init);
 	module_free(mod, mod->module_core);
@@ -2911,8 +2889,7 @@ static struct module *load_module(void _
 	if (err < 0)
 		goto unlink;
 
-	/* Get rid of temporary copy and strmap. */
-	kfree(info.strmap);
+	/* Get rid of temporary copy. */
 	free_copy(&info);
 
 	/* Done! */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2 2/2] module: Fix performance regression on modules with large symbol tables
  2011-11-21  5:06       ` Rusty Russell
  2011-11-21  6:29         ` Kevin Cernekee
@ 2011-11-21  8:08         ` Jan Beulich
  1 sibling, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2011-11-21  8:08 UTC (permalink / raw)
  To: Kevin Cernekee, Rusty Russell; +Cc: linux-kbuild, linux-kernel

>>> On 21.11.11 at 06:06, Rusty Russell <rusty@rustcorp.com.au> wrote:
> On Thu, 17 Nov 2011 19:15:02 -0800, Kevin Cernekee <cernekee@gmail.com> wrote:
>> Potential memory savings from reusing EXACT strtab matches: 36,475 bytes
>> total; 35,432 from nvidia.ko.  For comparison, discarding non-core
>> strtab entries saved about 91KB on my system.
> ...
>> Original scheme: 1.230s
>> With patch V2: 0.280s
>> With naive copying: 0.058s
> 
> I'm deeply tempted.  It's very simple, 46 lines shorter, preserves exact
> matches, and doesn't have any strange slowdowns on corner cases.
> 
> But like Kevin, I could be convinced either way.  Jan?

I'd be fine with the simpler solution, as long it is clearly understood
(and marked as intended to be that was for the time being) that it
has room for improvement, should it ever turn out to yield more
significant savings in other contexts.

Jan


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-11-22  0:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-13  3:08 [PATCH V2 1/2] module: Add comments describing how the "strmap" logic works Kevin Cernekee
2011-11-13  3:08 ` [PATCH V2 2/2] module: Fix performance regression on modules with large symbol tables Kevin Cernekee
2011-11-14  0:35   ` Rusty Russell
2011-11-14  8:48   ` Jan Beulich
2011-11-16  0:24     ` Rusty Russell
2011-11-18  3:15     ` Kevin Cernekee
2011-11-21  5:06       ` Rusty Russell
2011-11-21  6:29         ` Kevin Cernekee
2011-11-21 23:57           ` Rusty Russell
2011-11-21  8:08         ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox