public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] Add kallsyms_lookup() result cache
@ 2004-06-19  0:43 Albert Cahalan
  2004-06-19  4:51 ` Herbert Xu
  2004-06-19 10:06 ` Andrew Morton
  0 siblings, 2 replies; 16+ messages in thread
From: Albert Cahalan @ 2004-06-19  0:43 UTC (permalink / raw)
  To: linux-kernel mailing list; +Cc: ak, bcasavan

Andi Kleen writes:
> Brent Casavant writes:

>> On 2.6 based systems, the top command utilizes /proc/[pid]/wchan to
>> determine WCHAN symbol name information.  This information is provided
>> by the kernel function kallsyms_lookup(), which expands a stem-compressed
>
> That sounds more like a bug in your top to me. /proc/*/wchan itself
> does not access kallsyms, it just outputs a number. 
>
> My top doesn't do that.

To get a working WCHAN column on a 2.6.xx kernel, your top
must do that. You also must have CONFIG_KALLSYMS enabled.

> Are you saying your top reads /proc/kallsyms on each redisplay? 
> That sounds completely wrong - it should only read the file once
> and cache it and then look the numbers it is reading from wchan
> in the cache.
>
> Doing the cache in the kernel is the wrong place. This should be fixed
> in user space.

No way, because:

1. kernel modules may be loaded or unloaded at any time
2. the /proc/*/wchan files don't provide both name and address

I'd be happy to make top (and the rest of procps) use a cache
if those problems were addressed. I need a signal sent on
module load/unload, or a real /proc/kallsyms st_mtime that I
can poll. I also need to have the numeric wchan address in
the /proc/*/wchan file, such that it is reliably the same
thing as the function name already found there.

Jesse Barnes writes:

> /proc/<pid>/wchan reports the function name as a string.
> You're arguing that doing that in the kernel is the wrong
> thing to do, right?  If so, it would make sense to change
> its behavior.  Either way, I guess we can fix top to use
> /proc/<pid>/stat instead, and lookup symbols in an external
> System.map file.

This is exactly what top does when /proc/*/wchan is missing.
It does not work well, due to kernel modules, and also due
to the inability to reliably determine if a System.map file
matches the currently running kernel. Also, simply parsing
that file is slow due to the size of it.

Brent Casavant also writes:

> The cache size of 32 items was chosen experimentally.  16 was
> too few to cache all the results on even a small system with
> approximatley 100 tasks, and 64 was too large for all but the
> most extreme cases.

I found that you need about 64 when using a simple hash,
but going to 256 is really cheap so why not? The old procps
symbol cache code (now unused) worked well. I notice that
you are scanning an array for ranges; it is better to hash
on exact address matches. Also, failures get cached and no
attempt was made to keep multiple items in a hash bucket.

Here it is. The rest of the code should be obvious.

typedef struct symb {
  const char *name;
  unsigned KLONG addr;
} symb;
...
static const symb fail = { "?", 0 };
...
static symb hashtable[256];
...
hash = (address >> 4) & 0xff;

> Further work obviously needs to be done for top itself to reduce CPU
> utilization even further, but this is a large first step.  Some quick
> experiments indicate that the slow path has now moved to other areas
> of code, which will be addressed in due course.

Jim and I have already used profilers on top. He mostly used
gprof. I mostly used gcc's ability to call some custom code on
function entry and exit, counting either the function calls or
the CPU cycles they consumed. So, lots of luck to you, but...

(eh, this assumes you're using the all-new top in procps 3.x.x)

If you want things fast, provide the DWARF2 debug info
needed to make use of /dev/mem.  >:-)  The LKCD project
might have a patch for this; they were converting from
STABS to DWARF2 last I heard.

Brent Casavant later writes:

> Hmm.  procps version 3.1.3 introduced the use of /proc/*/wchan
> where possible.  I'm using 2.0.13 normally, which appears to
> have the same behavior (can't find a changelong for the 2
> series at the moment).

Oh. Well. There's your problem. The procps-3.x.x code will only
read /proc/*/wchan for the processes shown on your screen.
It also doesn't corrupt wchan with a buffer overflow.



^ permalink raw reply	[flat|nested] 16+ messages in thread
* [PATCH] Add kallsyms_lookup() result cache
@ 2004-06-18 20:03 Brent Casavant
  2004-06-18 20:37 ` Martin Josefsson
  2004-06-18 22:03 ` Andi Kleen
  0 siblings, 2 replies; 16+ messages in thread
From: Brent Casavant @ 2004-06-18 20:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: rusty, ak

On 2.6 based systems, the top command utilizes /proc/[pid]/wchan to
determine WCHAN symbol name information.  This information is provided
by the kernel function kallsyms_lookup(), which expands a stem-compressed
list of kernel symbols in order to provide a symbol name.  The decompression
is relatively slow, performing a lot of throw-away work in the form of
strncpy() invocations whose results are overwritten immediately.  On systems
with many processes, this results in the kernel consuming a large amount of
time on top's behalf performing the decompression.  For example, on a
large SGI Altix system with 4500 current tasks, there is approximately
8 seconds worth of work to be done with each top screen update.

The following patch adds a result cache to kallsyms_lookup(), cacheing
previously decompressed symbol names and using them on subsequent
kallsyms_lookup() calls.  With this change, top with a 1 second update
now consumes only about 60% of a CPU on the previously mentioned sytem,
as opposed to the prior 800%.  A trivial test program which repeatedly
opens/reads/closes /proc/1/wchan improves from 160 iterations per second
to 30000.

The cache size of 32 items was chosen experimentally.  16 was too few
to cache all the results on even a small system with approximatley 100
tasks, and 64 was too large for all but the most extreme cases.  Cacheing
was chosen instead of improving the expansion algorithm due to the large
number of strlen() invocations which would have been necessary with that
solution.

Further work obviously needs to be done for top itself to reduce CPU
utilization even further, but this is a large first step.  Some quick
experiments indicate that the slow path has now moved to other areas
of code, which will be addressed in due course.

Signed-off-by: Brent Casavant <bcasavan@sgi.com>

--- linux.orig/kernel/kallsyms.c	2004-03-16 14:13:30.000000000 -0600
+++ linux/kernel/kallsyms.c	2004-06-18 14:28:17.000000000 -0500
@@ -56,6 +56,16 @@
 	return module_kallsyms_lookup_name(name);
 }

+/* Cache of uncompressed symbol names from kallsyms_lookup() */
+#define NAMECACHE_SIZE 32
+static int namecache_index;
+static struct {
+	unsigned long addr;
+	unsigned long nextsym;
+	char name[127];
+} namecache[NAMECACHE_SIZE];
+static rwlock_t namecache_lock;
+
 /* Lookup an address.  modname is set to NULL if it's in the kernel. */
 const char *kallsyms_lookup(unsigned long addr,
 			    unsigned long *symbolsize,
@@ -74,6 +84,22 @@
 		unsigned long symbol_end;
 		char *name = kallsyms_names;

+		/* Search for a cached response */
+		read_lock(&namecache_lock);
+		for (i = 0; i < NAMECACHE_SIZE; i++) {
+			if (namecache[i].addr &&
+			    namecache[i].addr <= addr &&
+			    addr < namecache[i].nextsym) {
+				strncpy(namebuf, namecache[i].name, 127);
+				*symbolsize = namecache[i].nextsym - namecache[i].addr;
+				*modname = NULL;
+				*offset = addr - namecache[i].addr;
+				read_unlock(&namecache_lock);
+				return namebuf;
+			}
+		}
+		read_unlock(&namecache_lock);
+
 		/* They're sorted, we could be clever here, but who cares? */
 		for (i = 0; i < kallsyms_num_syms; i++) {
 			if (kallsyms_addresses[i] > kallsyms_addresses[best] &&
@@ -99,6 +125,15 @@
 		*symbolsize = symbol_end - kallsyms_addresses[best];
 		*modname = NULL;
 		*offset = addr - kallsyms_addresses[best];
+
+		/* Cache the results */
+		write_lock(&namecache_lock);
+		namecache[namecache_index].addr = kallsyms_addresses[best];
+		namecache[namecache_index].nextsym = symbol_end;
+		strncpy(namecache[namecache_index].name, namebuf, 127);
+		namecache_index = (namecache_index + 1) % NAMECACHE_SIZE;
+		write_unlock(&namecache_lock);
+
 		return namebuf;
 	}


-- 
Brent Casavant             bcasavan@sgi.com        Forget bright-eyed and
Operating System Engineer  http://www.sgi.com/     bushy-tailed; I'm red-
Silicon Graphics, Inc.     44.8562N 93.1355W 860F  eyed and bushy-haired.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2004-06-22 20:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-19  0:43 [PATCH] Add kallsyms_lookup() result cache Albert Cahalan
2004-06-19  4:51 ` Herbert Xu
2004-06-19 12:03   ` Albert Cahalan
2004-06-19 10:06 ` Andrew Morton
2004-06-19 13:05   ` Albert Cahalan
2004-06-22 19:45     ` Brent Casavant
2004-06-22 20:10       ` Jesse Barnes
  -- strict thread matches above, loose matches on Subject: below --
2004-06-18 20:03 Brent Casavant
2004-06-18 20:37 ` Martin Josefsson
2004-06-18 20:55   ` Brent Casavant
2004-06-18 22:03 ` Andi Kleen
2004-06-18 23:26   ` Jesse Barnes
2004-06-18 23:56     ` Andi Kleen
2004-06-19  0:16       ` Jesse Barnes
2004-06-18 23:31   ` Brent Casavant
2004-06-18 23:59     ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox