* [PATCH] kallsyms: speed up /proc/kallsyms
@ 2004-08-31 20:26 Paulo Marques
2004-09-01 5:24 ` Rusty Russell
2004-09-01 11:38 ` [PATCH] kallsyms: speed up /proc/kallsyms Paulo Marques
0 siblings, 2 replies; 12+ messages in thread
From: Paulo Marques @ 2004-08-31 20:26 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton, Rusty Russell, Matt Mackall
This patch implements the "is_exported" bit in the kallsyms_names
compressed stream, so that a "cat /proc/kallsyms" doesn't call
is_exported on every iteration.
It also adds all the suggestions from Matt Mackall and others,
and fixes the compilation error "Inconsistent kallsyms data"
when compiling with KALLSYMS_ALL.
A "time cat /proc/kallsyms" goes from 0.25s without the patch to
0.00s with the patch. (Pentium4 2.8GHz, defconfig)
There is still a problem with the previous patch (and probably
remains in this patch too, but it must be tested) reported by
wli, that on big-endian machines the result is gibberish. This
was tested by wli on a sparc64.
Reading through the code, I can not find anything that assumes
any particular endian-ness.
I really need someone with a sparc64 hardware to do a kernel
build and then send me the ".tmp_kallsyms2.S" and "System.map"
files (gzip'd and in private), so that I can try to track down
the problem.
Also, someone with some other big-endian machine could test it
and see if it works or not, so that we know if the problem is
sparc64 specific or not.
The patch was built against 2.6.9-rc1-mm2.
As always, comments, suggestions, etc. are welcome.
--
Paulo Marques - www.grupopie.com
To err is human, but to really foul things up requires a computer.
Farmers' Almanac, 1978
Signed-Off-By: Paulo Marques <pmarques@grupopie.com>
kernel/kallsyms.c | 67 +++++++++++++------
scripts/kallsyms.c | 185 ++++++++++++++++++++++++++++++++++++++++++++---------
2 files changed, 202 insertions(+), 50 deletions(-)
diff -uprN -X ../dontdiff linux-2.6.9-rc1-mm2/kernel/kallsyms.c linux-2.6.9-rc1-mm2-kall/kernel/kallsyms.c
--- linux-2.6.9-rc1-mm2/kernel/kallsyms.c 2004-08-31 11:53:34.000000000 +0100
+++ linux-2.6.9-rc1-mm2-kall/kernel/kallsyms.c 2004-08-31 21:18:10.000000000 +0100
@@ -4,13 +4,12 @@
* Rewritten and vastly simplified by Rusty Russell for in-kernel
* module loader:
* Copyright 2002 Rusty Russell <rusty@rustcorp.com.au> IBM Corporation
- * Stem compression by Andi Kleen.
*
* ChangeLog:
*
- * (25/Aug/2004) Paulo Marques
+ * (25/Aug/2004) Paulo Marques <pmarques@grupopie.com>
* Changed the compression method from stem compression to "table lookup"
- * compression
+ * compression (see scripts/kallsyms.c for a more complete description)
*/
#include <linux/kallsyms.h>
#include <linux/module.h>
@@ -48,40 +47,61 @@ static inline int is_kernel_text(unsigne
return 0;
}
+/* expand a compressed symbol data into the resulting uncompressed string,
+ given the offset to where the symbol is in the compressed stream */
static unsigned int kallsyms_expand_symbol(unsigned int off, char *result)
{
- int len, tlen;
+ int len;
u8 *tptr, *data;
+ /* get the compressed symbol length from the first symbol byte,
+ * masking out the "is_exported" bit */
data = &kallsyms_names[off];
+ len = (*data) & 0x7F;
+ data++;
- len=*data++;
+ /* update the offset to return the offset for the next symbol on
+ the compressed stream */
off += len + 1;
+
+ /* for every byte on the compressed symbol data, copy the table
+ entry for that byte */
while(len) {
- tptr=&kallsyms_token_table[kallsyms_token_index[*data]];
+ tptr = &kallsyms_token_table[ kallsyms_token_index[*data] ];
data++;
len--;
- tlen=*tptr++;
- while(tlen) {
- *result++=*tptr++;
- tlen--;
+ while (*tptr) {
+ *result = *tptr;
+ result++;
+ tptr++;
}
}
- *result = 0;
+ *result = '\0';
+ /* return to offset to the next symbol */
return off;
}
+/* find the offset on the compressed stream given and index in the
+ kallsyms array */
static unsigned int get_symbol_offset(unsigned long pos)
{
u8 *name;
int i;
+ /* use the closest marker we have. We have markers every
+ 256 positions, so that should be close enough */
name = &kallsyms_names[ kallsyms_markers[pos>>8] ];
+
+ /* sequentially scan all the symbols up to the point we're
+ searching for. Every symbol is stored in a
+ [bit 7: is_exported | bits 6..0: <len>][<len> bytes of data]
+ format, so we just need to add the len to the current
+ pointer for every symbol we wish to skip */
for(i = 0; i < (pos&0xFF); i++)
- name = name + (*name) + 1;
+ name = name + ((*name) & 0x7F) + 1;
return name - kallsyms_names;
}
@@ -122,12 +142,16 @@ const char *kallsyms_lookup(unsigned lon
/* do a binary search on the sorted kallsyms_addresses array */
low = 0;
high = kallsyms_num_syms;
+
while (high-low > 1) {
mid = (low + high) / 2;
if (kallsyms_addresses[mid] <= addr) low = mid;
else high = mid;
}
- while (low && kallsyms_addresses[low-1] == kallsyms_addresses[low])
+
+ /* search for the first aliased symbol. Aliased symbols are
+ symbols with the same address */
+ while (low && kallsyms_addresses[low - 1] == kallsyms_addresses[low])
--low;
/* Grab name */
@@ -141,8 +165,8 @@ const char *kallsyms_lookup(unsigned lon
}
}
+ /* if we found no next symbol, we use the end of the section */
if (!symbol_end) {
- /* At worst, symbol ends at end of section. */
if (is_kernel_inittext(addr))
symbol_end = (unsigned long)_einittext;
else
@@ -182,7 +206,7 @@ void __print_symbol(const char *fmt, uns
printk(fmt, buffer);
}
-/* To avoid O(n^2) iteration, we carry prefix along. */
+/* To avoid using get_symbol_offset for every symbol, we carry prefix along. */
struct kallsym_iter
{
loff_t pos;
@@ -217,16 +241,20 @@ static unsigned long get_ksymbol_core(st
{
unsigned off = iter->nameoff;
- off = kallsyms_expand_symbol(off, iter->name);
-
iter->owner = NULL;
iter->value = kallsyms_addresses[iter->pos];
+
if (is_kernel_text(iter->value) || is_kernel_inittext(iter->value))
iter->type = 't';
else
iter->type = 'd';
- upcase_if_global(iter);
+ /* check the "is_exported" bit on the compressed stream */
+ if (kallsyms_names[off] & 0x80)
+ iter->type += 'A' - 'a';
+
+ off = kallsyms_expand_symbol(off, iter->name);
+
return off - iter->nameoff;
}
@@ -306,7 +334,8 @@ struct seq_operations kallsyms_op = {
static int kallsyms_open(struct inode *inode, struct file *file)
{
/* We keep iterator in m->private, since normal case is to
- * s_start from where we left off, so we avoid O(N^2). */
+ * s_start from where we left off, so we avoid doing
+ * using get_symbol_offset for every symbol */
struct kallsym_iter *iter;
int ret;
diff -uprN -X ../dontdiff linux-2.6.9-rc1-mm2/scripts/kallsyms.c linux-2.6.9-rc1-mm2-kall/scripts/kallsyms.c
--- linux-2.6.9-rc1-mm2/scripts/kallsyms.c 2004-08-31 11:53:34.000000000 +0100
+++ linux-2.6.9-rc1-mm2-kall/scripts/kallsyms.c 2004-08-31 21:09:50.000000000 +0100
@@ -9,10 +9,19 @@
*
* ChangeLog:
*
- * (25/Aug/2004) Paulo Marques
+ * (25/Aug/2004) Paulo Marques <pmarques@grupopie.com>
* Changed the compression method from stem compression to "table lookup"
* compression
*
+ * Table compression uses all the unused char codes on the symbols and
+ * maps these to the most used substrings (tokens). For instance, it might
+ * map char code 0xF7 to represent "write_" and then in every symbol where
+ * "write_" appears it can be replaced by 0xF7, saving 5 bytes.
+ * The used codes themselves are also placed in the table so that the
+ * decompresion can work without "special cases".
+ * Applied to kernel symbols, this usually produces a compression ratio
+ * of about 50%.
+ *
*/
#include <stdio.h>
@@ -20,28 +29,38 @@
#include <string.h>
#include <ctype.h>
-/* compression tunning settings */
+/* maximum token length used. It doesn't pay to increase it a lot, because
+ * very long substrings probably don't repeat themselves too often. */
#define MAX_TOK_SIZE 11
#define KSYM_NAME_LEN 127
/* we use only a subset of the complete symbol table to gather the token count,
- to speed up compression, at the expense of a little compression ratio
-*/
+ * to speed up compression, at the expense of a little compression ratio */
#define WORKING_SET 1024
+
+/* first find the best token only on the list of tokens that would profit more
+ * than GOOD_BAD_THRESHOLD. Only if this list is empty go to the "bad" list.
+ * Increasing this value will put less tokens on the "good" list, so the search
+ * is faster. However, if the good list runs out of tokens, we must painfully
+ * search the bad list. */
#define GOOD_BAD_THRESHOLD 10
+/* token hash parameters */
#define HASH_BITS 18
#define HASH_TABLE_SIZE (1 << HASH_BITS)
#define HASH_MASK (HASH_TABLE_SIZE - 1)
#define HASH_BASE_OFFSET 2166136261U
#define HASH_FOLD(a) ((a)&(HASH_MASK))
+/* flags to mark symbols */
+#define SYM_FLAG_VALID 1
+#define SYM_FLAG_SAMPLED 2
+#define SYM_FLAG_EXPORTED 4
struct sym_entry {
unsigned long long addr;
char type;
- char sample;
- char valid;
+ unsigned char flags;
unsigned char len;
unsigned char *sym;
};
@@ -49,23 +68,28 @@ struct sym_entry {
static struct sym_entry *table;
static int size, cnt;
-static unsigned long long _stext, _etext, _sinittext, _einittext;
+static unsigned long long _stext, _etext, _sinittext, _einittext, _start_ksymtab, _stop_ksymtab;
static int all_symbols = 0;
+/* aray of pointers into the symbol table sorted by name */
+static struct sym_entry **sorted_table;
struct token {
unsigned char data[MAX_TOK_SIZE];
unsigned char len;
+ /* profit: the number of bytes that could be saved by inserting this
+ * token into the table */
int profit;
- struct token *next;
- struct token *right;
- struct token *left;
- struct token *smaller;
+ struct token *next; /* next token on the hash list */
+ struct token *right; /* next token on the good/bad list */
+ struct token *left; /* previous token on the good/bad list */
+ struct token *smaller; /* token that is less one letter than this one */
};
struct token bad_head, good_head;
struct token *hash_table[HASH_TABLE_SIZE];
+/* the table that holds the result of the compression */
unsigned char best_table[256][MAX_TOK_SIZE+1];
unsigned char best_table_len[256];
@@ -101,6 +125,10 @@ read_symbol(FILE *in, struct sym_entry *
_sinittext = s->addr;
else if (strcmp(str, "_einittext") == 0)
_einittext = s->addr;
+ else if (strcmp(str, "__start___ksymtab") == 0)
+ _start_ksymtab = s->addr;
+ else if (strcmp(str, "__stop___ksymtab") == 0)
+ _stop_ksymtab = s->addr;
else if (toupper(s->type) == 'A' || toupper(s->type) == 'U')
return -1;
@@ -126,7 +154,10 @@ symbol_valid(struct sym_entry *s)
if (strstr(s->sym, "_compiled.") ||
strcmp(s->sym, "kallsyms_addresses") == 0 ||
strcmp(s->sym, "kallsyms_num_syms") == 0 ||
- strcmp(s->sym, "kallsyms_names") == 0)
+ strcmp(s->sym, "kallsyms_names") == 0 ||
+ strcmp(s->sym, "kallsyms_markers") == 0 ||
+ strcmp(s->sym, "kallsyms_token_table") == 0 ||
+ strcmp(s->sym, "kallsyms_token_index") == 0)
return 0;
/* Exclude linker generated symbols which vary between passes */
@@ -161,16 +192,21 @@ static void output_label(char *label)
printf("%s:\n",label);
}
+/* uncompress a compressed symbol. When this function is called, the best table
+ * might still be compressed itself, so the function needs to be recursive */
static int expand_symbol(unsigned char *data, int len, char *result)
{
int c, rlen, total=0;
while (len) {
c = *data;
+ /* if the table holds a single char that is the same as the one
+ * we are looking for, then end the search */
if (best_table[c][0]==c && best_table_len[c]==1) {
*result++ = c;
total++;
} else {
+ /* if not, recurse and expand */
rlen = expand_symbol(best_table[c], best_table_len[c], result);
total += rlen;
result += rlen;
@@ -205,7 +241,7 @@ write_src(void)
output_label("kallsyms_addresses");
valid = 0;
for (i = 0; i < cnt; i++) {
- if (table[i].valid) {
+ if (table[i].flags & SYM_FLAG_VALID) {
printf("\tPTR\t%#llx\n", table[i].addr);
valid++;
}
@@ -216,6 +252,8 @@ write_src(void)
printf("\tPTR\t%d\n", valid);
printf("\n");
+ /* table of offset markers, that give the offset in the compressed stream
+ * every 256 symbols */
markers = (unsigned int *) malloc(sizeof(unsigned int)*((valid + 255) / 256));
output_label("kallsyms_names");
@@ -223,13 +261,15 @@ write_src(void)
off = 0;
for (i = 0; i < cnt; i++) {
- if (!table[i].valid)
+ if (!table[i].flags & SYM_FLAG_VALID)
continue;
if ((valid & 0xFF) == 0)
markers[valid >> 8] = off;
- printf("\t.byte 0x%02x", table[i].len);
+ k = table[i].len;
+ if (table[i].flags & SYM_FLAG_EXPORTED) k |= 0x80;
+ printf("\t.byte 0x%02x", k);
for (k = 0; k < table[i].len; k++)
printf(", 0x%02x", table[i].sym[k]);
printf("\n");
@@ -244,14 +284,15 @@ write_src(void)
printf("\tPTR\t%d\n", markers[i]);
printf("\n");
+ free(markers);
+
output_label("kallsyms_token_table");
off = 0;
for (i = 0; i < 256; i++) {
best_idx[i] = off;
expand_symbol(best_table[i],best_table_len[i],buf);
- k = strlen(buf);
- printf("\t.byte 0x%02x\n\t.ascii\t\"%s\"\n", k, buf);
- off += k + 1;
+ printf("\t.asciz\t\"%s\"\n", buf);
+ off += strlen(buf) + 1;
}
printf("\n");
@@ -280,6 +321,7 @@ static unsigned int hash_token(unsigned
return HASH_FOLD(hash);
}
+/* find a token given its data and hash value */
static struct token *find_token_hash(unsigned char *data, int len, unsigned int hash)
{
struct token *ptr;
@@ -309,6 +351,9 @@ static inline void remove_token_from_gro
ptr->right->left = ptr->left;
}
+
+/* build the counts for all the tokens that start with "data", and have lenghts
+ * from 2 to "len" */
static void learn_token(unsigned char *data, int len)
{
struct token *ptr,*last_ptr;
@@ -319,6 +364,7 @@ static void learn_token(unsigned char *d
if (len > MAX_TOK_SIZE)
len = MAX_TOK_SIZE;
+ /* calculate and store the hash values for all the sub-tokens */
hash = rehash_token(hash, data[0]);
for (i = 2; i <= len; i++) {
hash = rehash_token(hash, data[i-1]);
@@ -334,10 +380,19 @@ static void learn_token(unsigned char *d
if (!ptr) ptr = find_token_hash(data, i, hash);
if (!ptr) {
+ /* create a new token entry */
ptr = (struct token *) malloc(sizeof(*ptr));
+
memcpy(ptr->data, data, i);
ptr->len = i;
+
+ /* when we create an entry, it's profit is 0 because
+ * we also take into account the size of the token on
+ * the compressed table. We then subtract GOOD_BAD_THRESHOLD
+ * so that the test to see if this token belongs to
+ * the good or bad list, is a comparison to zero */
ptr->profit = -GOOD_BAD_THRESHOLD;
+
ptr->next = hash_table[hash];
hash_table[hash] = ptr;
@@ -346,11 +401,13 @@ static void learn_token(unsigned char *d
ptr->smaller = NULL;
} else {
newprofit = ptr->profit + (ptr->len - 1);
+ /* check to see if this token needs to be moved to a
+ * different list */
if((ptr->profit < 0) && (newprofit >= 0)) {
remove_token_from_group(ptr);
insert_token_in_group(&good_head,ptr);
}
- ptr->profit = newprofit;
+ ptr->profit = newprofit;
}
if (last_ptr) last_ptr->smaller = ptr;
@@ -360,6 +417,10 @@ static void learn_token(unsigned char *d
}
}
+/* decrease the counts for all the tokens that start with "data", and have lenghts
+ * from 2 to "len". This function is much simpler than learn_token because we have
+ * more guarantees (tho tokens exist, the ->smaller pointer is set, etc.)
+ * The two separate functions exist only because of compression performance */
static void forget_token(unsigned char *data, int len)
{
struct token *ptr;
@@ -384,6 +445,7 @@ static void forget_token(unsigned char *
}
}
+/* count all the possible tokens in a symbol */
static void learn_symbol(unsigned char *symbol, int len)
{
int i;
@@ -392,6 +454,7 @@ static void learn_symbol(unsigned char *
learn_token(symbol + i, len - i);
}
+/* decrease the count for all the possible tokens in a symbol */
static void forget_symbol(unsigned char *symbol, int len)
{
int i;
@@ -400,49 +463,98 @@ static void forget_symbol(unsigned char
forget_token(symbol + i, len - i);
}
+static int symbol_sort(const void *a, const void *b)
+{
+ return strcmp( (*((struct sym_entry **) a))->sym,
+ (*((struct sym_entry **) b))->sym );
+}
+
+
+/* find out if a symbol is exported. Exported symbols have a corresponding
+ * __ksymtab_<symbol> entry and their addresses are between __start___ksymtab
+ * and __stop___ksymtab */
+static int is_exported(char *name)
+{
+ struct sym_entry key, *ksym, **result;
+ char buf[KSYM_NAME_LEN+32];
+
+ sprintf(buf, "__ksymtab_%s", name);
+ key.sym = buf;
+
+ ksym = &key;
+ result = bsearch(&ksym, sorted_table, cnt,
+ sizeof(struct sym_entry *), symbol_sort);
+
+ if(!result) return 0;
+
+ ksym = *result;
+
+ return ((ksym->addr >= _start_ksymtab) && (ksym->addr < _stop_ksymtab));
+}
+
+/* set all the symbol flags and do the initial token count */
static void build_initial_tok_table(void)
{
int i, use_it, valid;
+ /* build a sorted symbol pointer array so that searching a particular
+ * symbol is faster */
+ sorted_table = (struct sym_entry **) malloc(sizeof(struct sym_entry *) * cnt);
+ for (i = 0; i < cnt; i++)
+ sorted_table[i] = &table[i];
+ qsort(sorted_table, cnt, sizeof(struct sym_entry *), symbol_sort);
+
valid = 0;
for (i = 0; i < cnt; i++) {
- table[i].valid = symbol_valid(&table[i]);
- if (table[i].valid) valid++;
+ table[i].flags = 0;
+ if ( symbol_valid(&table[i]) ) {
+ table[i].flags |= SYM_FLAG_VALID;
+ valid++;
+ }
}
use_it = 0;
for (i = 0; i < cnt; i++) {
- table[i].sample = 0;
- if (table[i].valid) {
+ if (table[i].flags & SYM_FLAG_VALID) {
+
use_it += WORKING_SET;
+
if (use_it >= valid) {
- table[i].sample = 1;
+ table[i].flags |= SYM_FLAG_SAMPLED;
use_it -= valid;
}
+
+ if( is_exported(table[i].sym) )
+ table[i].flags |= SYM_FLAG_EXPORTED;
}
- if (table[i].sample)
+ if (table[i].flags & SYM_FLAG_SAMPLED)
learn_symbol(table[i].sym, table[i].len);
}
}
+/* replace a given token in all the valid symbols. Use the sampled symbols
+ * to update the counts */
static void compress_symbols(unsigned char *str, int tlen, int idx)
{
int i, len, learn, size;
unsigned char *p;
for (i = 0; i < cnt; i++) {
- if (!table[i].valid) continue;
+
+ if (!(table[i].flags & SYM_FLAG_VALID)) continue;
len = table[i].len;
learn = 0;
p = table[i].sym;
do {
+ /* find the token on the symbol */
p = (unsigned char *) strstr((char *) p, (char *) str);
if (!p) break;
if (!learn) {
- if (table[i].sample)
+ /* if this symbol was used to count, decrease it */
+ if (table[i].flags & SYM_FLAG_SAMPLED)
forget_symbol(table[i].sym, len);
learn = 1;
}
@@ -457,11 +569,14 @@ static void compress_symbols(unsigned ch
if(learn) {
table[i].len = len;
- if(table[i].sample) learn_symbol(table[i].sym, len);
+ /* if this symbol was used to count, learn it again */
+ if(table[i].flags & SYM_FLAG_SAMPLED)
+ learn_symbol(table[i].sym, len);
}
}
}
+/* search the token with the maximum profit */
static struct token *find_best_token(void)
{
struct token *ptr,*best,*head;
@@ -486,29 +601,37 @@ static struct token *find_best_token(voi
return best;
}
+/* this is the core of the algorithm: calculate the "best" table */
static void optimize_result(void)
{
struct token *best;
int i;
/* using the '\0' symbol last allows compress_symbols to use standard
- fast string functions
- */
+ * fast string functions */
for (i = 255; i >= 0; i--) {
+
+ /* if this table slot is empty (it is not used by an actual
+ * original char code */
if (!best_table_len[i]) {
+
+ /* find the token with the breates profit value */
best = find_best_token();
+ /* place it in the "best" table */
best_table_len[i] = best->len;
memcpy(best_table[i], best->data, best_table_len[i]);
/* zero terminate the token so that we can use strstr
in compress_symbols */
best_table[i][best_table_len[i]]='\0';
+ /* replace this token in all the valid symbols */
compress_symbols(best_table[i], best_table_len[i], i);
}
}
}
+/* start by placing the symbols that are actually used on the table */
static void insert_real_symbols_in_table(void)
{
int i, j, c;
@@ -517,7 +640,7 @@ static void insert_real_symbols_in_table
memset(best_table_len, 0, sizeof(best_table_len));
for (i = 0; i < cnt; i++) {
- if (table[i].valid) {
+ if (table[i].flags & SYM_FLAG_VALID) {
for (j = 0; j < table[i].len; j++) {
c = table[i].sym[j];
best_table[c][0]=c;
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] kallsyms: speed up /proc/kallsyms
2004-08-31 20:26 [PATCH] kallsyms: speed up /proc/kallsyms Paulo Marques
@ 2004-09-01 5:24 ` Rusty Russell
2004-09-01 11:17 ` Paulo Marques
2004-09-01 11:38 ` [PATCH] kallsyms: speed up /proc/kallsyms Paulo Marques
1 sibling, 1 reply; 12+ messages in thread
From: Rusty Russell @ 2004-09-01 5:24 UTC (permalink / raw)
To: Paulo Marques; +Cc: linux-kernel, Andrew Morton, Matt Mackall
On Wed, 2004-09-01 at 06:26, Paulo Marques wrote:
> This patch implements the "is_exported" bit in the kallsyms_names
> compressed stream, so that a "cat /proc/kallsyms" doesn't call
> is_exported on every iteration.
Prefer the patch split into "comments", "inconsistent kallsyms data fix"
and "speedup". I also prefer using a whole letter over a single bit:
this allows archs which have wierd nm letters to express them, and
instead of case indicating what symbols are exported, we get the real
correct results.
Thanks,
Rusty.
--
Anyone who quotes me in their signature is an idiot -- Rusty Russell
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] kallsyms: speed up /proc/kallsyms
2004-09-01 5:24 ` Rusty Russell
@ 2004-09-01 11:17 ` Paulo Marques
2004-09-01 19:27 ` Sam Ravnborg
0 siblings, 1 reply; 12+ messages in thread
From: Paulo Marques @ 2004-09-01 11:17 UTC (permalink / raw)
To: Rusty Russell; +Cc: linux-kernel, Andrew Morton, Matt Mackall
Rusty Russell wrote:
> On Wed, 2004-09-01 at 06:26, Paulo Marques wrote:
>
>>This patch implements the "is_exported" bit in the kallsyms_names
>>compressed stream, so that a "cat /proc/kallsyms" doesn't call
>>is_exported on every iteration.
>
>
> Prefer the patch split into "comments", "inconsistent kallsyms data fix"
> and "speedup". I also prefer using a whole letter over a single bit:
> this allows archs which have wierd nm letters to express them, and
> instead of case indicating what symbols are exported, we get the real
> correct results.
I'm still new to this :(
I'll send more fine-grained patches next time, grouped by issue addressed.
The single bit approach was meant to keep the current behavior, because
that is what I thought you wanted:
> The current code is simple. We could reserve the first letter in
> kallsyms_names for the type letter from System.map. The current upcase
> semantics are deliberately distorted to be more kernel-relevant (ie.
> exported are upper case) but simplistic.
>
> That's how I'd recommend "fixing" it.
I read this as: "we could have used the first letter in each symbol for
the type letter from System.map, but we deliberately distorted the
current semantics to be more kernel-relevant."
Since we are on different time zones, we only get to send an email a
day, so asking for confirmation and waiting for the reply would take a
long time... :(
Anyway, I'm assuming now that what we really want is the same type chars
that appear on the System.map file. If compiled with KALLSYMS_ALL, the
/proc/kallsyms file would be an almost exact copy of System.map.
So, moving forward...
A defconfig build produces 13743 symbols with a compressed name stream
of ~130kB. (it is 240kB uncompressed, for the curious)
Adding a letter to each symbol would increase this by about 10%.
We can try 2 different approaches to minimize the impact of this:
- have the letter inserted before the compression step. This way, the
table of the best tokens may have "tacpi_" instead of "acpi_" and
the compression would not suffer as much, except that the symbols
started with "Tacpi_" would suffer. Only real tests can show how
this would turn out.
- build a "sections" table that groups together symbols with the same
letter. The table would say symbols that have addresses between
X and Y would have letter Z. This can go horribly wrong if there
are situations where completely different type letters appear
intermixed.
I think I'll try the first approach first and see how it goes. I'll
post as soon as I got some numbers.
--
Paulo Marques - www.grupopie.com
To err is human, but to really foul things up requires a computer.
Farmers' Almanac, 1978
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] kallsyms: speed up /proc/kallsyms
2004-09-01 11:17 ` Paulo Marques
@ 2004-09-01 19:27 ` Sam Ravnborg
2004-09-01 19:44 ` Paulo Marques
0 siblings, 1 reply; 12+ messages in thread
From: Sam Ravnborg @ 2004-09-01 19:27 UTC (permalink / raw)
To: Paulo Marques; +Cc: Rusty Russell, linux-kernel, Andrew Morton, Matt Mackall
On Wed, Sep 01, 2004 at 12:17:18PM +0100, Paulo Marques wrote:
> So, moving forward...
>
> A defconfig build produces 13743 symbols with a compressed name stream
> of ~130kB. (it is 240kB uncompressed, for the curious)
>
> Adding a letter to each symbol would increase this by about 10%.
>
> We can try 2 different approaches to minimize the impact of this:
>
> - have the letter inserted before the compression step. This way, the
> table of the best tokens may have "tacpi_" instead of "acpi_" and
> the compression would not suffer as much, except that the symbols
> started with "Tacpi_" would suffer. Only real tests can show how
> this would turn out.
>
> - build a "sections" table that groups together symbols with the same
> letter. The table would say symbols that have addresses between
> X and Y would have letter Z. This can go horribly wrong if there
> are situations where completely different type letters appear
> intermixed.
>
> I think I'll try the first approach first and see how it goes. I'll
> post as soon as I got some numbers.
When you have made the split Rusty requested and implemented
the above could you please send patches to me. I will add them to
my kbuild queue.
Yes - I have acccepted your rationale why to keep the split
of functionality between kallsyms and the kernel.
Sam
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] kallsyms: speed up /proc/kallsyms
2004-09-01 19:27 ` Sam Ravnborg
@ 2004-09-01 19:44 ` Paulo Marques
2004-09-01 19:51 ` Sam Ravnborg
0 siblings, 1 reply; 12+ messages in thread
From: Paulo Marques @ 2004-09-01 19:44 UTC (permalink / raw)
To: Sam Ravnborg; +Cc: Rusty Russell, linux-kernel, Andrew Morton, Matt Mackall
Sam Ravnborg wrote:
> ...
>
> When you have made the split Rusty requested and implemented
> the above could you please send patches to me. I will add them to
> my kbuild queue.
I'd be glad to do this, but AFAICT the patch already entered the mm
tree, so I think that splitting it now, or sending it through a
different path would probably add to the confusion I already
managed to create :(
Implementing the "type char" should be a single patch on top of this
one, though. I'll be sure to CC you when I post it (probably today).
Thanks,
--
Paulo Marques - www.grupopie.com
To err is human, but to really foul things up requires a computer.
Farmers' Almanac, 1978
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] kallsyms: speed up /proc/kallsyms
2004-09-01 19:44 ` Paulo Marques
@ 2004-09-01 19:51 ` Sam Ravnborg
2004-09-02 12:05 ` Paulo Marques
0 siblings, 1 reply; 12+ messages in thread
From: Sam Ravnborg @ 2004-09-01 19:51 UTC (permalink / raw)
To: Paulo Marques
Cc: Sam Ravnborg, Rusty Russell, linux-kernel, Andrew Morton,
Matt Mackall
On Wed, Sep 01, 2004 at 08:44:20PM +0100, Paulo Marques wrote:
> Sam Ravnborg wrote:
> >...
> >
> >When you have made the split Rusty requested and implemented
> >the above could you please send patches to me. I will add them to
> >my kbuild queue.
>
> I'd be glad to do this, but AFAICT the patch already entered the mm
> tree, so I think that splitting it now, or sending it through a
> different path would probably add to the confusion I already
> managed to create :(
I prefer the split-up Rusty requested.
It will then enter -mm via my queue - but as three logical separated
patches. This is much better when looking into this later.
Andrew will just back-out your previous patch and mark it as 'merged'.
Sam
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] kallsyms: speed up /proc/kallsyms
2004-09-01 19:51 ` Sam Ravnborg
@ 2004-09-02 12:05 ` Paulo Marques
2004-09-02 22:17 ` Sam Ravnborg
0 siblings, 1 reply; 12+ messages in thread
From: Paulo Marques @ 2004-09-02 12:05 UTC (permalink / raw)
To: Sam Ravnborg; +Cc: Rusty Russell, linux-kernel, Andrew Morton, Matt Mackall
Sam Ravnborg wrote:
> On Wed, Sep 01, 2004 at 08:44:20PM +0100, Paulo Marques wrote:
>
>>Sam Ravnborg wrote:
>>
>>>...
>>>
>>>When you have made the split Rusty requested and implemented
>>>the above could you please send patches to me. I will add them to
>>>my kbuild queue.
>>
>>I'd be glad to do this, but AFAICT the patch already entered the mm
>>tree, so I think that splitting it now, or sending it through a
>>different path would probably add to the confusion I already
>>managed to create :(
>
>
> I prefer the split-up Rusty requested.
> It will then enter -mm via my queue - but as three logical separated
> patches. This is much better when looking into this later.
>
> Andrew will just back-out your previous patch and mark it as 'merged'.
Ok, I'll send you the 3 patches then, no problem.
However, the third patch will already be the "type char"
implementation instead of the "is-exported" bit patch.
All 3 patches will be against 2.6.9-rc1-mm2. I'm just saying
this to make sure I understood correctly what I'm supposed to
do.
Anyway, I did some tests with the "type char" included in the
compressed stream.
The original data: 13743 symbols ~240kB uncompressed data
Compressed:
without type char: 126292 bytes
with type char inserted:
after compression 140035 bytes
before compression 137073 bytes
before compression, lower case 134222 bytes
The last option in this table is to keep the extra bit to say
"the type for this symbol is upper case" and place the type
always in lowercase, to improve compression.
The gain with the lower case doesn't seem to make up for the
_ugliness_ of the method. Keeping an extra bit together with
the length of the symbol, assuming that the symbol length
will never be more than 127, is not pretty at all and forces
the decompression code to have more "special cases".
Inserting just the type char before compressing seems to be
the most cleaner approach.
Note that this is a defconfig setup without the KALLSYMS_ALL
config option. With KALLSYMS_ALL, the compressed stream size
goes to about 300kb and the gains should grow proportionally.
(this reminds me, I should include a patch to change the
help description that says that KALLSYMS_ALL adds about
300kb to the kernel image to say that it adds about 200kb)
I'll try to build all this tonight, and send the new version.
If you don't agree with the "type char" approach that seems
the best to me, please say so now, or forever hold your
peace :)
--
Paulo Marques - www.grupopie.com
To err is human, but to really foul things up requires a computer.
Farmers' Almanac, 1978
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] kallsyms: speed up /proc/kallsyms
2004-09-02 12:05 ` Paulo Marques
@ 2004-09-02 22:17 ` Sam Ravnborg
2004-09-03 1:31 ` pmarques
0 siblings, 1 reply; 12+ messages in thread
From: Sam Ravnborg @ 2004-09-02 22:17 UTC (permalink / raw)
To: Paulo Marques
Cc: Sam Ravnborg, Rusty Russell, linux-kernel, Andrew Morton,
Matt Mackall
On Thu, Sep 02, 2004 at 01:05:18PM +0100, Paulo Marques wrote:
> All 3 patches will be against 2.6.9-rc1-mm2. I'm just saying
> this to make sure I understood correctly what I'm supposed to
> do.
Preferable on top of Linus - latest.
Sam
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] kallsyms: speed up /proc/kallsyms
2004-09-02 22:17 ` Sam Ravnborg
@ 2004-09-03 1:31 ` pmarques
2004-09-03 1:31 ` Andrew Morton
0 siblings, 1 reply; 12+ messages in thread
From: pmarques @ 2004-09-03 1:31 UTC (permalink / raw)
To: Sam Ravnborg; +Cc: Rusty Russell, linux-kernel, Andrew Morton, Matt Mackall
Quoting Sam Ravnborg <sam@ravnborg.org>:
> On Thu, Sep 02, 2004 at 01:05:18PM +0100, Paulo Marques wrote:
>
> > All 3 patches will be against 2.6.9-rc1-mm2. I'm just saying
> > this to make sure I understood correctly what I'm supposed to
> > do.
>
> Preferable on top of Linus - latest.
I was preparing to do just that, but bumped into a simple problem.
If I patch against Linus tree, then the 3 patches suggested by
Rusty Russell make no sense, because the Linus tree still has stem
compression. So there is no inconsistency bug and there are no
comments to add, there is only a single patch to go from stem
compression to the new compression scheme.
It does not sound so bad to have just one patch that appears at
2.6.9-rc2 that says "change kallsyms compression scheme", so I
have no problem producing this patch.
I'm now holding on to avoid start sending patches against
different trees and make a total mess :(
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] kallsyms: speed up /proc/kallsyms
2004-09-03 1:31 ` pmarques
@ 2004-09-03 1:31 ` Andrew Morton
2004-09-03 2:59 ` [PATCH] kallsyms: correct type char in /proc/kallsyms Paulo Marques
0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2004-09-03 1:31 UTC (permalink / raw)
To: pmarques; +Cc: sam, rusty, linux-kernel, mpm
"" <pmarques@grupopie.com> wrote:
>
> Quoting Sam Ravnborg <sam@ravnborg.org>:
>
> > On Thu, Sep 02, 2004 at 01:05:18PM +0100, Paulo Marques wrote:
> >
> > > All 3 patches will be against 2.6.9-rc1-mm2. I'm just saying
> > > this to make sure I understood correctly what I'm supposed to
> > > do.
> >
> > Preferable on top of Linus - latest.
>
> I was preparing to do just that, but bumped into a simple problem.
>
> If I patch against Linus tree, then the 3 patches suggested by
> Rusty Russell make no sense, because the Linus tree still has stem
> compression. So there is no inconsistency bug and there are no
> comments to add, there is only a single patch to go from stem
> compression to the new compression scheme.
>
> It does not sound so bad to have just one patch that appears at
> 2.6.9-rc2 that says "change kallsyms compression scheme", so I
> have no problem producing this patch.
>
> I'm now holding on to avoid start sending patches against
> different trees and make a total mess :(
In that case please prepare diffs against -mm. I've dropped a
snapshot patch against 2.6.9-rc1 at
http://www.zip.com.au/~akpm/linux/patches/stuff/x.bz2
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] kallsyms: correct type char in /proc/kallsyms
2004-09-03 1:31 ` Andrew Morton
@ 2004-09-03 2:59 ` Paulo Marques
0 siblings, 0 replies; 12+ messages in thread
From: Paulo Marques @ 2004-09-03 2:59 UTC (permalink / raw)
To: Andrew Morton; +Cc: sam, rusty, linux-kernel, mpm
[-- Attachment #1: Type: text/plain, Size: 1155 bytes --]
Andrew Morton wrote:
> ...
>
> In that case please prepare diffs against -mm. I've dropped a
> snapshot patch against 2.6.9-rc1 at
> http://www.zip.com.au/~akpm/linux/patches/stuff/x.bz2
Thanks!
This patch removes the is-exported bit from the last patch and
implements a complete type char, so that /proc/kallsyms
resembles better the System.map file.
In fact, if compiled with KALLSYMS_ALL the only differences
between /proc/kallsyms and System.map are the symbols that are
left out on purpose: types 'A' and 'U', and kallsyms_xxx.
I removed these symbols from System.map and diff'ed against
/proc/kallsyms and the files where completely identical :)
The System.map file occupied about 980Kb whereas the kallsyms data
needed to generate the same output occupied about 440Kb.
The patch should apply over the last one, in case someone wants
to test it, without needing the -mm snapshot.
I'm sending it attached this time to avoid the problems we had the
last time.
Comments, suggestions, flames are always welcome :)
--
Paulo Marques - www.grupopie.com
To err is human, but to really foul things up requires a computer.
Farmers' Almanac, 1978
[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 9039 bytes --]
--- linux-2.6.9-rc1-mmsnap/kernel/kallsyms.c 2004-09-03 02:59:57.000000000 +0100
+++ linux-2.6.9-rc1-kall/kernel/kallsyms.c 2004-09-03 02:28:01.000000000 +0100
@@ -51,17 +51,16 @@ static inline int is_kernel_text(unsigne
given the offset to where the symbol is in the compressed stream */
static unsigned int kallsyms_expand_symbol(unsigned int off, char *result)
{
- int len;
+ int len, skipped_first = 0;
u8 *tptr, *data;
- /* get the compressed symbol length from the first symbol byte,
- * masking out the "is_exported" bit */
+ /* get the compressed symbol length from the first symbol byte */
data = &kallsyms_names[off];
- len = (*data) & 0x7F;
+ len = *data;
data++;
/* update the offset to return the offset for the next symbol on
- the compressed stream */
+ * the compressed stream */
off += len + 1;
/* for every byte on the compressed symbol data, copy the table
@@ -72,8 +71,11 @@ static unsigned int kallsyms_expand_symb
len--;
while (*tptr) {
- *result = *tptr;
- result++;
+ if(skipped_first) {
+ *result = *tptr;
+ result++;
+ } else
+ skipped_first = 1;
tptr++;
}
}
@@ -84,24 +86,33 @@ static unsigned int kallsyms_expand_symb
return off;
}
+/* get symbol type information. This is encoded as a single char at the
+ * begining of the symbol name */
+static char kallsyms_get_symbol_type(unsigned int off)
+{
+ /* get just the first code, look it up in the token table, and return the
+ * first char from this token */
+ return kallsyms_token_table[ kallsyms_token_index[ kallsyms_names[off+1] ] ];
+}
+
+
/* find the offset on the compressed stream given and index in the
- kallsyms array */
+ * kallsyms array */
static unsigned int get_symbol_offset(unsigned long pos)
{
u8 *name;
int i;
- /* use the closest marker we have. We have markers every
- 256 positions, so that should be close enough */
+ /* use the closest marker we have. We have markers every 256 positions,
+ * so that should be close enough */
name = &kallsyms_names[ kallsyms_markers[pos>>8] ];
- /* sequentially scan all the symbols up to the point we're
- searching for. Every symbol is stored in a
- [bit 7: is_exported | bits 6..0: <len>][<len> bytes of data]
- format, so we just need to add the len to the current
- pointer for every symbol we wish to skip */
+ /* sequentially scan all the symbols up to the point we're searching for.
+ * Every symbol is stored in a [<len>][<len> bytes of data] format, so we
+ * just need to add the len to the current pointer for every symbol we
+ * wish to skip */
for(i = 0; i < (pos&0xFF); i++)
- name = name + ((*name) & 0x7F) + 1;
+ name = name + (*name) + 1;
return name - kallsyms_names;
}
@@ -243,15 +254,8 @@ static unsigned long get_ksymbol_core(st
iter->owner = NULL;
iter->value = kallsyms_addresses[iter->pos];
-
- if (is_kernel_text(iter->value) || is_kernel_inittext(iter->value))
- iter->type = 't';
- else
- iter->type = 'd';
-
- /* check the "is_exported" bit on the compressed stream */
- if (kallsyms_names[off] & 0x80)
- iter->type += 'A' - 'a';
+
+ iter->type = kallsyms_get_symbol_type(off);
off = kallsyms_expand_symbol(off, iter->name);
--- linux-2.6.9-rc1-mmsnap/scripts/kallsyms.c 2004-09-03 02:59:57.000000000 +0100
+++ linux-2.6.9-rc1-kall/scripts/kallsyms.c 2004-09-03 02:35:25.000000000 +0100
@@ -55,7 +55,6 @@
/* flags to mark symbols */
#define SYM_FLAG_VALID 1
#define SYM_FLAG_SAMPLED 2
-#define SYM_FLAG_EXPORTED 4
struct sym_entry {
unsigned long long addr;
@@ -68,12 +67,9 @@ struct sym_entry {
static struct sym_entry *table;
static int size, cnt;
-static unsigned long long _stext, _etext, _sinittext, _einittext, _start_ksymtab, _stop_ksymtab;
+static unsigned long long _stext, _etext, _sinittext, _einittext;
static int all_symbols = 0;
-/* aray of pointers into the symbol table sorted by name */
-static struct sym_entry **sorted_table;
-
struct token {
unsigned char data[MAX_TOK_SIZE];
unsigned char len;
@@ -125,45 +121,56 @@ read_symbol(FILE *in, struct sym_entry *
_sinittext = s->addr;
else if (strcmp(str, "_einittext") == 0)
_einittext = s->addr;
- else if (strcmp(str, "__start___ksymtab") == 0)
- _start_ksymtab = s->addr;
- else if (strcmp(str, "__stop___ksymtab") == 0)
- _stop_ksymtab = s->addr;
else if (toupper(s->type) == 'A' || toupper(s->type) == 'U')
return -1;
- s->sym = strdup(str);
- s->len = strlen(str);
+ /* include the type field in the symbol name, so that it gets
+ * compressed together */
+ s->len = strlen(str) + 1;
+ s->sym = (char *) malloc(s->len + 1);
+ strcpy(s->sym + 1, str);
+ s->sym[0] = s->type;
+
return 0;
}
static int
symbol_valid(struct sym_entry *s)
{
+ /* Symbols which vary between passes. Passes 1 and 2 must have
+ * identical symbol lists. The kallsyms_* symbols below are only added
+ * after pass 1, they would be included in pass 2 when --all-symbols is
+ * specified so exclude them to get a stable symbol list.
+ */
+ static char *special_symbols[] = {
+ "kallsyms_addresses",
+ "kallsyms_num_syms",
+ "kallsyms_names",
+ "kallsyms_markers",
+ "kallsyms_token_table",
+ "kallsyms_token_index",
+
+ /* Exclude linker generated symbols which vary between passes */
+ "_SDA_BASE_", /* ppc */
+ "_SDA2_BASE_", /* ppc */
+ NULL };
+ int i;
+
+ /* if --all-symbols is not specified, then symbols outside the text
+ * and inittext sections are discarded */
if (!all_symbols) {
if ((s->addr < _stext || s->addr > _etext)
&& (s->addr < _sinittext || s->addr > _einittext))
return 0;
}
- /* Exclude symbols which vary between passes. Passes 1 and 2 must have
- * identical symbol lists. The kallsyms_* symbols below are only added
- * after pass 1, they would be included in pass 2 when --all-symbols is
- * specified so exclude them to get a stable symbol list.
- */
- if (strstr(s->sym, "_compiled.") ||
- strcmp(s->sym, "kallsyms_addresses") == 0 ||
- strcmp(s->sym, "kallsyms_num_syms") == 0 ||
- strcmp(s->sym, "kallsyms_names") == 0 ||
- strcmp(s->sym, "kallsyms_markers") == 0 ||
- strcmp(s->sym, "kallsyms_token_table") == 0 ||
- strcmp(s->sym, "kallsyms_token_index") == 0)
+ /* Exclude symbols which vary between passes. */
+ if (strstr(s->sym + 1, "_compiled."))
return 0;
- /* Exclude linker generated symbols which vary between passes */
- if (strcmp(s->sym, "_SDA_BASE_") == 0 || /* ppc */
- strcmp(s->sym, "_SDA2_BASE_") == 0) /* ppc */
- return 0;
+ for (i = 0; special_symbols[i]; i++)
+ if( strcmp(s->sym + 1, special_symbols[i]) == 0 )
+ return 0;
return 1;
}
@@ -267,9 +274,7 @@ write_src(void)
if ((valid & 0xFF) == 0)
markers[valid >> 8] = off;
- k = table[i].len;
- if (table[i].flags & SYM_FLAG_EXPORTED) k |= 0x80;
- printf("\t.byte 0x%02x", k);
+ printf("\t.byte 0x%02x", table[i].len);
for (k = 0; k < table[i].len; k++)
printf(", 0x%02x", table[i].sym[k]);
printf("\n");
@@ -463,47 +468,11 @@ static void forget_symbol(unsigned char
forget_token(symbol + i, len - i);
}
-static int symbol_sort(const void *a, const void *b)
-{
- return strcmp( (*((struct sym_entry **) a))->sym,
- (*((struct sym_entry **) b))->sym );
-}
-
-
-/* find out if a symbol is exported. Exported symbols have a corresponding
- * __ksymtab_<symbol> entry and their addresses are between __start___ksymtab
- * and __stop___ksymtab */
-static int is_exported(char *name)
-{
- struct sym_entry key, *ksym, **result;
- char buf[KSYM_NAME_LEN+32];
-
- sprintf(buf, "__ksymtab_%s", name);
- key.sym = buf;
-
- ksym = &key;
- result = bsearch(&ksym, sorted_table, cnt,
- sizeof(struct sym_entry *), symbol_sort);
-
- if(!result) return 0;
-
- ksym = *result;
-
- return ((ksym->addr >= _start_ksymtab) && (ksym->addr < _stop_ksymtab));
-}
-
/* set all the symbol flags and do the initial token count */
static void build_initial_tok_table(void)
{
int i, use_it, valid;
- /* build a sorted symbol pointer array so that searching a particular
- * symbol is faster */
- sorted_table = (struct sym_entry **) malloc(sizeof(struct sym_entry *) * cnt);
- for (i = 0; i < cnt; i++)
- sorted_table[i] = &table[i];
- qsort(sorted_table, cnt, sizeof(struct sym_entry *), symbol_sort);
-
valid = 0;
for (i = 0; i < cnt; i++) {
table[i].flags = 0;
@@ -515,6 +484,10 @@ static void build_initial_tok_table(void
use_it = 0;
for (i = 0; i < cnt; i++) {
+
+ /* subsample the available symbols. This method is almost like
+ * a Bresenham's algorithm to get uniformly distributed samples
+ * across the symbol table */
if (table[i].flags & SYM_FLAG_VALID) {
use_it += WORKING_SET;
@@ -523,9 +496,6 @@ static void build_initial_tok_table(void
table[i].flags |= SYM_FLAG_SAMPLED;
use_it -= valid;
}
-
- if( is_exported(table[i].sym) )
- table[i].flags |= SYM_FLAG_EXPORTED;
}
if (table[i].flags & SYM_FLAG_SAMPLED)
learn_symbol(table[i].sym, table[i].len);
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] kallsyms: speed up /proc/kallsyms
2004-08-31 20:26 [PATCH] kallsyms: speed up /proc/kallsyms Paulo Marques
2004-09-01 5:24 ` Rusty Russell
@ 2004-09-01 11:38 ` Paulo Marques
1 sibling, 0 replies; 12+ messages in thread
From: Paulo Marques @ 2004-09-01 11:38 UTC (permalink / raw)
To: Paulo Marques; +Cc: linux-kernel, Andrew Morton, Rusty Russell, Matt Mackall
[-- Attachment #1: Type: text/plain, Size: 326 bytes --]
My email client added spaces in the beggining of some of the lines
of the patch, so that it doesn't apply.
I'm sending it attached this time to make sure it gets through.
Sorry about this mess,
--
Paulo Marques - www.grupopie.com
To err is human, but to really foul things up requires a computer.
Farmers' Almanac, 1978
[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 19219 bytes --]
diff -uprN -X ../dontdiff linux-2.6.9-rc1-mm2/kernel/kallsyms.c linux-2.6.9-rc1-mm2-kall/kernel/kallsyms.c
--- linux-2.6.9-rc1-mm2/kernel/kallsyms.c 2004-08-31 11:53:34.000000000 +0100
+++ linux-2.6.9-rc1-mm2-kall/kernel/kallsyms.c 2004-08-31 21:18:10.000000000 +0100
@@ -4,13 +4,12 @@
* Rewritten and vastly simplified by Rusty Russell for in-kernel
* module loader:
* Copyright 2002 Rusty Russell <rusty@rustcorp.com.au> IBM Corporation
- * Stem compression by Andi Kleen.
*
* ChangeLog:
*
- * (25/Aug/2004) Paulo Marques
+ * (25/Aug/2004) Paulo Marques <pmarques@grupopie.com>
* Changed the compression method from stem compression to "table lookup"
- * compression
+ * compression (see scripts/kallsyms.c for a more complete description)
*/
#include <linux/kallsyms.h>
#include <linux/module.h>
@@ -48,40 +47,61 @@ static inline int is_kernel_text(unsigne
return 0;
}
+/* expand a compressed symbol data into the resulting uncompressed string,
+ given the offset to where the symbol is in the compressed stream */
static unsigned int kallsyms_expand_symbol(unsigned int off, char *result)
{
- int len, tlen;
+ int len;
u8 *tptr, *data;
+ /* get the compressed symbol length from the first symbol byte,
+ * masking out the "is_exported" bit */
data = &kallsyms_names[off];
+ len = (*data) & 0x7F;
+ data++;
- len=*data++;
+ /* update the offset to return the offset for the next symbol on
+ the compressed stream */
off += len + 1;
+
+ /* for every byte on the compressed symbol data, copy the table
+ entry for that byte */
while(len) {
- tptr=&kallsyms_token_table[kallsyms_token_index[*data]];
+ tptr = &kallsyms_token_table[ kallsyms_token_index[*data] ];
data++;
len--;
- tlen=*tptr++;
- while(tlen) {
- *result++=*tptr++;
- tlen--;
+ while (*tptr) {
+ *result = *tptr;
+ result++;
+ tptr++;
}
}
- *result = 0;
+ *result = '\0';
+ /* return to offset to the next symbol */
return off;
}
+/* find the offset on the compressed stream given and index in the
+ kallsyms array */
static unsigned int get_symbol_offset(unsigned long pos)
{
u8 *name;
int i;
+ /* use the closest marker we have. We have markers every
+ 256 positions, so that should be close enough */
name = &kallsyms_names[ kallsyms_markers[pos>>8] ];
+
+ /* sequentially scan all the symbols up to the point we're
+ searching for. Every symbol is stored in a
+ [bit 7: is_exported | bits 6..0: <len>][<len> bytes of data]
+ format, so we just need to add the len to the current
+ pointer for every symbol we wish to skip */
for(i = 0; i < (pos&0xFF); i++)
- name = name + (*name) + 1;
+ name = name + ((*name) & 0x7F) + 1;
return name - kallsyms_names;
}
@@ -122,12 +142,16 @@ const char *kallsyms_lookup(unsigned lon
/* do a binary search on the sorted kallsyms_addresses array */
low = 0;
high = kallsyms_num_syms;
+
while (high-low > 1) {
mid = (low + high) / 2;
if (kallsyms_addresses[mid] <= addr) low = mid;
else high = mid;
}
- while (low && kallsyms_addresses[low-1] == kallsyms_addresses[low])
+
+ /* search for the first aliased symbol. Aliased symbols are
+ symbols with the same address */
+ while (low && kallsyms_addresses[low - 1] == kallsyms_addresses[low])
--low;
/* Grab name */
@@ -141,8 +165,8 @@ const char *kallsyms_lookup(unsigned lon
}
}
+ /* if we found no next symbol, we use the end of the section */
if (!symbol_end) {
- /* At worst, symbol ends at end of section. */
if (is_kernel_inittext(addr))
symbol_end = (unsigned long)_einittext;
else
@@ -182,7 +206,7 @@ void __print_symbol(const char *fmt, uns
printk(fmt, buffer);
}
-/* To avoid O(n^2) iteration, we carry prefix along. */
+/* To avoid using get_symbol_offset for every symbol, we carry prefix along. */
struct kallsym_iter
{
loff_t pos;
@@ -217,16 +241,20 @@ static unsigned long get_ksymbol_core(st
{
unsigned off = iter->nameoff;
- off = kallsyms_expand_symbol(off, iter->name);
-
iter->owner = NULL;
iter->value = kallsyms_addresses[iter->pos];
+
if (is_kernel_text(iter->value) || is_kernel_inittext(iter->value))
iter->type = 't';
else
iter->type = 'd';
- upcase_if_global(iter);
+ /* check the "is_exported" bit on the compressed stream */
+ if (kallsyms_names[off] & 0x80)
+ iter->type += 'A' - 'a';
+
+ off = kallsyms_expand_symbol(off, iter->name);
+
return off - iter->nameoff;
}
@@ -306,7 +334,8 @@ struct seq_operations kallsyms_op = {
static int kallsyms_open(struct inode *inode, struct file *file)
{
/* We keep iterator in m->private, since normal case is to
- * s_start from where we left off, so we avoid O(N^2). */
+ * s_start from where we left off, so we avoid doing
+ * using get_symbol_offset for every symbol */
struct kallsym_iter *iter;
int ret;
diff -uprN -X ../dontdiff linux-2.6.9-rc1-mm2/scripts/kallsyms.c linux-2.6.9-rc1-mm2-kall/scripts/kallsyms.c
--- linux-2.6.9-rc1-mm2/scripts/kallsyms.c 2004-08-31 11:53:34.000000000 +0100
+++ linux-2.6.9-rc1-mm2-kall/scripts/kallsyms.c 2004-08-31 21:09:50.000000000 +0100
@@ -9,10 +9,19 @@
*
* ChangeLog:
*
- * (25/Aug/2004) Paulo Marques
+ * (25/Aug/2004) Paulo Marques <pmarques@grupopie.com>
* Changed the compression method from stem compression to "table lookup"
* compression
*
+ * Table compression uses all the unused char codes on the symbols and
+ * maps these to the most used substrings (tokens). For instance, it might
+ * map char code 0xF7 to represent "write_" and then in every symbol where
+ * "write_" appears it can be replaced by 0xF7, saving 5 bytes.
+ * The used codes themselves are also placed in the table so that the
+ * decompresion can work without "special cases".
+ * Applied to kernel symbols, this usually produces a compression ratio
+ * of about 50%.
+ *
*/
#include <stdio.h>
@@ -20,28 +29,38 @@
#include <string.h>
#include <ctype.h>
-/* compression tunning settings */
+/* maximum token length used. It doesn't pay to increase it a lot, because
+ * very long substrings probably don't repeat themselves too often. */
#define MAX_TOK_SIZE 11
#define KSYM_NAME_LEN 127
/* we use only a subset of the complete symbol table to gather the token count,
- to speed up compression, at the expense of a little compression ratio
-*/
+ * to speed up compression, at the expense of a little compression ratio */
#define WORKING_SET 1024
+
+/* first find the best token only on the list of tokens that would profit more
+ * than GOOD_BAD_THRESHOLD. Only if this list is empty go to the "bad" list.
+ * Increasing this value will put less tokens on the "good" list, so the search
+ * is faster. However, if the good list runs out of tokens, we must painfully
+ * search the bad list. */
#define GOOD_BAD_THRESHOLD 10
+/* token hash parameters */
#define HASH_BITS 18
#define HASH_TABLE_SIZE (1 << HASH_BITS)
#define HASH_MASK (HASH_TABLE_SIZE - 1)
#define HASH_BASE_OFFSET 2166136261U
#define HASH_FOLD(a) ((a)&(HASH_MASK))
+/* flags to mark symbols */
+#define SYM_FLAG_VALID 1
+#define SYM_FLAG_SAMPLED 2
+#define SYM_FLAG_EXPORTED 4
struct sym_entry {
unsigned long long addr;
char type;
- char sample;
- char valid;
+ unsigned char flags;
unsigned char len;
unsigned char *sym;
};
@@ -49,23 +68,28 @@ struct sym_entry {
static struct sym_entry *table;
static int size, cnt;
-static unsigned long long _stext, _etext, _sinittext, _einittext;
+static unsigned long long _stext, _etext, _sinittext, _einittext, _start_ksymtab, _stop_ksymtab;
static int all_symbols = 0;
+/* aray of pointers into the symbol table sorted by name */
+static struct sym_entry **sorted_table;
struct token {
unsigned char data[MAX_TOK_SIZE];
unsigned char len;
+ /* profit: the number of bytes that could be saved by inserting this
+ * token into the table */
int profit;
- struct token *next;
- struct token *right;
- struct token *left;
- struct token *smaller;
+ struct token *next; /* next token on the hash list */
+ struct token *right; /* next token on the good/bad list */
+ struct token *left; /* previous token on the good/bad list */
+ struct token *smaller; /* token that is less one letter than this one */
};
struct token bad_head, good_head;
struct token *hash_table[HASH_TABLE_SIZE];
+/* the table that holds the result of the compression */
unsigned char best_table[256][MAX_TOK_SIZE+1];
unsigned char best_table_len[256];
@@ -101,6 +125,10 @@ read_symbol(FILE *in, struct sym_entry *
_sinittext = s->addr;
else if (strcmp(str, "_einittext") == 0)
_einittext = s->addr;
+ else if (strcmp(str, "__start___ksymtab") == 0)
+ _start_ksymtab = s->addr;
+ else if (strcmp(str, "__stop___ksymtab") == 0)
+ _stop_ksymtab = s->addr;
else if (toupper(s->type) == 'A' || toupper(s->type) == 'U')
return -1;
@@ -126,7 +154,10 @@ symbol_valid(struct sym_entry *s)
if (strstr(s->sym, "_compiled.") ||
strcmp(s->sym, "kallsyms_addresses") == 0 ||
strcmp(s->sym, "kallsyms_num_syms") == 0 ||
- strcmp(s->sym, "kallsyms_names") == 0)
+ strcmp(s->sym, "kallsyms_names") == 0 ||
+ strcmp(s->sym, "kallsyms_markers") == 0 ||
+ strcmp(s->sym, "kallsyms_token_table") == 0 ||
+ strcmp(s->sym, "kallsyms_token_index") == 0)
return 0;
/* Exclude linker generated symbols which vary between passes */
@@ -161,16 +192,21 @@ static void output_label(char *label)
printf("%s:\n",label);
}
+/* uncompress a compressed symbol. When this function is called, the best table
+ * might still be compressed itself, so the function needs to be recursive */
static int expand_symbol(unsigned char *data, int len, char *result)
{
int c, rlen, total=0;
while (len) {
c = *data;
+ /* if the table holds a single char that is the same as the one
+ * we are looking for, then end the search */
if (best_table[c][0]==c && best_table_len[c]==1) {
*result++ = c;
total++;
} else {
+ /* if not, recurse and expand */
rlen = expand_symbol(best_table[c], best_table_len[c], result);
total += rlen;
result += rlen;
@@ -205,7 +241,7 @@ write_src(void)
output_label("kallsyms_addresses");
valid = 0;
for (i = 0; i < cnt; i++) {
- if (table[i].valid) {
+ if (table[i].flags & SYM_FLAG_VALID) {
printf("\tPTR\t%#llx\n", table[i].addr);
valid++;
}
@@ -216,6 +252,8 @@ write_src(void)
printf("\tPTR\t%d\n", valid);
printf("\n");
+ /* table of offset markers, that give the offset in the compressed stream
+ * every 256 symbols */
markers = (unsigned int *) malloc(sizeof(unsigned int)*((valid + 255) / 256));
output_label("kallsyms_names");
@@ -223,13 +261,15 @@ write_src(void)
off = 0;
for (i = 0; i < cnt; i++) {
- if (!table[i].valid)
+ if (!table[i].flags & SYM_FLAG_VALID)
continue;
if ((valid & 0xFF) == 0)
markers[valid >> 8] = off;
- printf("\t.byte 0x%02x", table[i].len);
+ k = table[i].len;
+ if (table[i].flags & SYM_FLAG_EXPORTED) k |= 0x80;
+ printf("\t.byte 0x%02x", k);
for (k = 0; k < table[i].len; k++)
printf(", 0x%02x", table[i].sym[k]);
printf("\n");
@@ -244,14 +284,15 @@ write_src(void)
printf("\tPTR\t%d\n", markers[i]);
printf("\n");
+ free(markers);
+
output_label("kallsyms_token_table");
off = 0;
for (i = 0; i < 256; i++) {
best_idx[i] = off;
expand_symbol(best_table[i],best_table_len[i],buf);
- k = strlen(buf);
- printf("\t.byte 0x%02x\n\t.ascii\t\"%s\"\n", k, buf);
- off += k + 1;
+ printf("\t.asciz\t\"%s\"\n", buf);
+ off += strlen(buf) + 1;
}
printf("\n");
@@ -280,6 +321,7 @@ static unsigned int hash_token(unsigned
return HASH_FOLD(hash);
}
+/* find a token given its data and hash value */
static struct token *find_token_hash(unsigned char *data, int len, unsigned int hash)
{
struct token *ptr;
@@ -309,6 +351,9 @@ static inline void remove_token_from_gro
ptr->right->left = ptr->left;
}
+
+/* build the counts for all the tokens that start with "data", and have lenghts
+ * from 2 to "len" */
static void learn_token(unsigned char *data, int len)
{
struct token *ptr,*last_ptr;
@@ -319,6 +364,7 @@ static void learn_token(unsigned char *d
if (len > MAX_TOK_SIZE)
len = MAX_TOK_SIZE;
+ /* calculate and store the hash values for all the sub-tokens */
hash = rehash_token(hash, data[0]);
for (i = 2; i <= len; i++) {
hash = rehash_token(hash, data[i-1]);
@@ -334,10 +380,19 @@ static void learn_token(unsigned char *d
if (!ptr) ptr = find_token_hash(data, i, hash);
if (!ptr) {
+ /* create a new token entry */
ptr = (struct token *) malloc(sizeof(*ptr));
+
memcpy(ptr->data, data, i);
ptr->len = i;
+
+ /* when we create an entry, it's profit is 0 because
+ * we also take into account the size of the token on
+ * the compressed table. We then subtract GOOD_BAD_THRESHOLD
+ * so that the test to see if this token belongs to
+ * the good or bad list, is a comparison to zero */
ptr->profit = -GOOD_BAD_THRESHOLD;
+
ptr->next = hash_table[hash];
hash_table[hash] = ptr;
@@ -346,11 +401,13 @@ static void learn_token(unsigned char *d
ptr->smaller = NULL;
} else {
newprofit = ptr->profit + (ptr->len - 1);
+ /* check to see if this token needs to be moved to a
+ * different list */
if((ptr->profit < 0) && (newprofit >= 0)) {
remove_token_from_group(ptr);
insert_token_in_group(&good_head,ptr);
}
- ptr->profit = newprofit;
+ ptr->profit = newprofit;
}
if (last_ptr) last_ptr->smaller = ptr;
@@ -360,6 +417,10 @@ static void learn_token(unsigned char *d
}
}
+/* decrease the counts for all the tokens that start with "data", and have lenghts
+ * from 2 to "len". This function is much simpler than learn_token because we have
+ * more guarantees (tho tokens exist, the ->smaller pointer is set, etc.)
+ * The two separate functions exist only because of compression performance */
static void forget_token(unsigned char *data, int len)
{
struct token *ptr;
@@ -384,6 +445,7 @@ static void forget_token(unsigned char *
}
}
+/* count all the possible tokens in a symbol */
static void learn_symbol(unsigned char *symbol, int len)
{
int i;
@@ -392,6 +454,7 @@ static void learn_symbol(unsigned char *
learn_token(symbol + i, len - i);
}
+/* decrease the count for all the possible tokens in a symbol */
static void forget_symbol(unsigned char *symbol, int len)
{
int i;
@@ -400,49 +463,98 @@ static void forget_symbol(unsigned char
forget_token(symbol + i, len - i);
}
+static int symbol_sort(const void *a, const void *b)
+{
+ return strcmp( (*((struct sym_entry **) a))->sym,
+ (*((struct sym_entry **) b))->sym );
+}
+
+
+/* find out if a symbol is exported. Exported symbols have a corresponding
+ * __ksymtab_<symbol> entry and their addresses are between __start___ksymtab
+ * and __stop___ksymtab */
+static int is_exported(char *name)
+{
+ struct sym_entry key, *ksym, **result;
+ char buf[KSYM_NAME_LEN+32];
+
+ sprintf(buf, "__ksymtab_%s", name);
+ key.sym = buf;
+
+ ksym = &key;
+ result = bsearch(&ksym, sorted_table, cnt,
+ sizeof(struct sym_entry *), symbol_sort);
+
+ if(!result) return 0;
+
+ ksym = *result;
+
+ return ((ksym->addr >= _start_ksymtab) && (ksym->addr < _stop_ksymtab));
+}
+
+/* set all the symbol flags and do the initial token count */
static void build_initial_tok_table(void)
{
int i, use_it, valid;
+ /* build a sorted symbol pointer array so that searching a particular
+ * symbol is faster */
+ sorted_table = (struct sym_entry **) malloc(sizeof(struct sym_entry *) * cnt);
+ for (i = 0; i < cnt; i++)
+ sorted_table[i] = &table[i];
+ qsort(sorted_table, cnt, sizeof(struct sym_entry *), symbol_sort);
+
valid = 0;
for (i = 0; i < cnt; i++) {
- table[i].valid = symbol_valid(&table[i]);
- if (table[i].valid) valid++;
+ table[i].flags = 0;
+ if ( symbol_valid(&table[i]) ) {
+ table[i].flags |= SYM_FLAG_VALID;
+ valid++;
+ }
}
use_it = 0;
for (i = 0; i < cnt; i++) {
- table[i].sample = 0;
- if (table[i].valid) {
+ if (table[i].flags & SYM_FLAG_VALID) {
+
use_it += WORKING_SET;
+
if (use_it >= valid) {
- table[i].sample = 1;
+ table[i].flags |= SYM_FLAG_SAMPLED;
use_it -= valid;
}
+
+ if( is_exported(table[i].sym) )
+ table[i].flags |= SYM_FLAG_EXPORTED;
}
- if (table[i].sample)
+ if (table[i].flags & SYM_FLAG_SAMPLED)
learn_symbol(table[i].sym, table[i].len);
}
}
+/* replace a given token in all the valid symbols. Use the sampled symbols
+ * to update the counts */
static void compress_symbols(unsigned char *str, int tlen, int idx)
{
int i, len, learn, size;
unsigned char *p;
for (i = 0; i < cnt; i++) {
- if (!table[i].valid) continue;
+
+ if (!(table[i].flags & SYM_FLAG_VALID)) continue;
len = table[i].len;
learn = 0;
p = table[i].sym;
do {
+ /* find the token on the symbol */
p = (unsigned char *) strstr((char *) p, (char *) str);
if (!p) break;
if (!learn) {
- if (table[i].sample)
+ /* if this symbol was used to count, decrease it */
+ if (table[i].flags & SYM_FLAG_SAMPLED)
forget_symbol(table[i].sym, len);
learn = 1;
}
@@ -457,11 +569,14 @@ static void compress_symbols(unsigned ch
if(learn) {
table[i].len = len;
- if(table[i].sample) learn_symbol(table[i].sym, len);
+ /* if this symbol was used to count, learn it again */
+ if(table[i].flags & SYM_FLAG_SAMPLED)
+ learn_symbol(table[i].sym, len);
}
}
}
+/* search the token with the maximum profit */
static struct token *find_best_token(void)
{
struct token *ptr,*best,*head;
@@ -486,29 +601,37 @@ static struct token *find_best_token(voi
return best;
}
+/* this is the core of the algorithm: calculate the "best" table */
static void optimize_result(void)
{
struct token *best;
int i;
/* using the '\0' symbol last allows compress_symbols to use standard
- fast string functions
- */
+ * fast string functions */
for (i = 255; i >= 0; i--) {
+
+ /* if this table slot is empty (it is not used by an actual
+ * original char code */
if (!best_table_len[i]) {
+
+ /* find the token with the breates profit value */
best = find_best_token();
+ /* place it in the "best" table */
best_table_len[i] = best->len;
memcpy(best_table[i], best->data, best_table_len[i]);
/* zero terminate the token so that we can use strstr
in compress_symbols */
best_table[i][best_table_len[i]]='\0';
+ /* replace this token in all the valid symbols */
compress_symbols(best_table[i], best_table_len[i], i);
}
}
}
+/* start by placing the symbols that are actually used on the table */
static void insert_real_symbols_in_table(void)
{
int i, j, c;
@@ -517,7 +640,7 @@ static void insert_real_symbols_in_table
memset(best_table_len, 0, sizeof(best_table_len));
for (i = 0; i < cnt; i++) {
- if (table[i].valid) {
+ if (table[i].flags & SYM_FLAG_VALID) {
for (j = 0; j < table[i].len; j++) {
c = table[i].sym[j];
best_table[c][0]=c;
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2004-09-03 3:08 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-31 20:26 [PATCH] kallsyms: speed up /proc/kallsyms Paulo Marques
2004-09-01 5:24 ` Rusty Russell
2004-09-01 11:17 ` Paulo Marques
2004-09-01 19:27 ` Sam Ravnborg
2004-09-01 19:44 ` Paulo Marques
2004-09-01 19:51 ` Sam Ravnborg
2004-09-02 12:05 ` Paulo Marques
2004-09-02 22:17 ` Sam Ravnborg
2004-09-03 1:31 ` pmarques
2004-09-03 1:31 ` Andrew Morton
2004-09-03 2:59 ` [PATCH] kallsyms: correct type char in /proc/kallsyms Paulo Marques
2004-09-01 11:38 ` [PATCH] kallsyms: speed up /proc/kallsyms Paulo Marques
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox