qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Hash table based symbol lookup (was: Re: [Qemu-devel] [PATCH] Fix symbol lookup for mips64* targets)
@ 2008-10-06 10:29 Laurent Desnogues
  2008-10-06 16:11 ` Blue Swirl
  0 siblings, 1 reply; 7+ messages in thread
From: Laurent Desnogues @ 2008-10-06 10:29 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 457 bytes --]

On Fri, Oct 3, 2008 at 7:14 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>>
>>  I have a hash-table based implementation somewhere if someone
>>  wants to take a look.
>
> That would be interesting, please send.

Here you go.

Things to note:

  - HT building could be done by a call to syminfo_ht_build
    from elfload.c/machload.c
  - no cleanup code
  - use of malloc
  - only slightly tested

So basically this is only for review purposes :)


Laurent

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: disass-ht.patch --]
[-- Type: text/x-patch; name=disass-ht.patch, Size: 3594 bytes --]

Index: disas.c
===================================================================
--- disas.c	(revision 5430)
+++ disas.c	(working copy)
@@ -300,9 +300,111 @@
     }
 }
 
+#if 1
+/* symbol hash table to speed up lookup_symbol */
+#define SYMINFO_HT_SIZE  3121u
+
+typedef struct syminfo_ht_entry_s syminfo_ht_entry_t;
+
+static unsigned int syminfo_ht_hash(target_ulong addr)
+{
+    return addr % SYMINFO_HT_SIZE;
+}
+
+struct syminfo_ht_entry_s {
+    target_ulong        addr;
+    const char         *strtab;
+    struct syminfo     *s;
+    const Elf32_Sym    *sym;
+    syminfo_ht_entry_t *next;
+};
+
+static syminfo_ht_entry_t *syminfo_ht[SYMINFO_HT_SIZE];
+
+static const syminfo_ht_entry_t *syminfo_ht_search(target_ulong addr)
+{
+    const syminfo_ht_entry_t *syminfo_ht_entry;
+
+    for (syminfo_ht_entry = syminfo_ht[syminfo_ht_hash(addr)];
+         syminfo_ht_entry;
+         syminfo_ht_entry = syminfo_ht_entry->next) {
+        if (addr == syminfo_ht_entry->addr)
+            return syminfo_ht_entry;
+    }
+    return NULL;
+}
+
+static const char *syminfo_ht_lookup(target_ulong addr)
+{
+    const syminfo_ht_entry_t *syminfo_ht_entry;
+
+    syminfo_ht_entry = syminfo_ht_search(addr);
+    if (syminfo_ht_entry)
+        return syminfo_ht_entry->strtab + syminfo_ht_entry->sym->st_name;
+    return "";
+}
+
+static void syminfo_ht_build(void)
+{
+  unsigned int i, key;
+    struct syminfo *s;
+    /* Hack, because we know this is x86. */
+    Elf32_Sym *sym;
+    target_ulong addr, j;
+    const syminfo_ht_entry_t *syminfo_ht_entry;
+    syminfo_ht_entry_t *new_entry;
+
+    for (s = syminfos; s; s = s->next) {
+        sym = s->disas_symtab;
+        for (i = 0; i < s->disas_num_syms; i++) {
+            if (sym[i].st_shndx == SHN_UNDEF
+                || sym[i].st_shndx >= SHN_LORESERVE)
+                continue;
+
+            if (ELF_ST_TYPE(sym[i].st_info) != STT_FUNC)
+                continue;
+
+            addr = sym[i].st_value;
+#if defined(TARGET_ARM) || defined (TARGET_MIPS)
+            /* The bottom address bit marks a Thumb or MIPS16 symbol.  */
+            addr &= ~(target_ulong)1;
+#endif
+            /* Enter [addr, addr + sym[i].st_size[ in the hash table. */
+            for (j = addr; j < addr + sym[i].st_size; j++) {
+                syminfo_ht_entry = syminfo_ht_search(j);
+                /* If the addr is already in, keep the first. */
+                if (!syminfo_ht_entry) {
+                    /* Add the symbol at the front of the list. */
+                    key = syminfo_ht_hash(j);
+                    new_entry = malloc(sizeof(syminfo_ht_entry_t));
+                    new_entry->addr   = j;
+                    new_entry->strtab = s->disas_strtab;
+                    new_entry->sym    = &sym[i];
+                    new_entry->next   = syminfo_ht[key];
+                    syminfo_ht[key] = new_entry;
+                }
+            }
+        }
+    }
+}
+
 /* Look up symbol for debugging purpose.  Returns "" if unknown. */
 const char *lookup_symbol(target_ulong orig_addr)
 {
+    static int ht_built = 0;
+
+    if (!ht_built) {
+        ht_built = 1;
+        syminfo_ht_build();
+    }
+    return syminfo_ht_lookup(orig_addr);
+}
+
+#else
+
+/* Look up symbol for debugging purpose.  Returns "" if unknown. */
+const char *lookup_symbol(target_ulong orig_addr)
+{
     unsigned int i;
     /* Hack, because we know this is x86. */
     Elf32_Sym *sym;
@@ -332,6 +434,8 @@
     return "";
 }
 
+#endif
+
 #if !defined(CONFIG_USER_ONLY)
 
 void term_vprintf(const char *fmt, va_list ap);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Hash table based symbol lookup (was: Re: [Qemu-devel] [PATCH] Fix symbol lookup for mips64* targets)
  2008-10-06 10:29 Hash table based symbol lookup (was: Re: [Qemu-devel] [PATCH] Fix symbol lookup for mips64* targets) Laurent Desnogues
@ 2008-10-06 16:11 ` Blue Swirl
  2008-10-06 16:28   ` Laurent Desnogues
  0 siblings, 1 reply; 7+ messages in thread
From: Blue Swirl @ 2008-10-06 16:11 UTC (permalink / raw)
  To: qemu-devel

On 10/6/08, Laurent Desnogues <laurent.desnogues@gmail.com> wrote:
> On Fri, Oct 3, 2008 at 7:14 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>  >>
>  >>  I have a hash-table based implementation somewhere if someone
>  >>  wants to take a look.
>  >
>  > That would be interesting, please send.
>
>  Here you go.
>
>  Things to note:
>
>   - HT building could be done by a call to syminfo_ht_build
>     from elfload.c/machload.c
>   - no cleanup code
>   - use of malloc
>   - only slightly tested
>
>  So basically this is only for review purposes :)

I hope I'm wrong, but it looks like in the worst case, for each byte
that the symbol covers, this implementation can allocate a structure
of tens of bytes in size. I wonder what the symbols of 1.5MB
openbios-sparc64 would take.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Hash table based symbol lookup (was: Re: [Qemu-devel] [PATCH] Fix symbol lookup for mips64* targets)
  2008-10-06 16:11 ` Blue Swirl
@ 2008-10-06 16:28   ` Laurent Desnogues
  2008-10-07  6:48     ` Laurent Desnogues
  0 siblings, 1 reply; 7+ messages in thread
From: Laurent Desnogues @ 2008-10-06 16:28 UTC (permalink / raw)
  To: qemu-devel

On Mon, Oct 6, 2008 at 6:11 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>
> I hope I'm wrong, but it looks like in the worst case, for each byte
> that the symbol covers, this implementation can allocate a structure
> of tens of bytes in size. I wonder what the symbols of 1.5MB
> openbios-sparc64 would take.

No, you're right:  for each byte of a function a structure of about
5 * ptr size is allocated.  In your case, that'd be about 60 MB.
It's bad and can probably be reduced.  But just try to run in
debug mode (-d) any application that has thousands of
symbols and you'll soon see you have to disable symbol
lookup as its complexity is linear.

I can reduce memory consumption but its size will still be a
multiple of the text segment size.


Laurent

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Hash table based symbol lookup (was: Re: [Qemu-devel] [PATCH] Fix symbol lookup for mips64* targets)
  2008-10-06 16:28   ` Laurent Desnogues
@ 2008-10-07  6:48     ` Laurent Desnogues
  2008-10-07 17:50       ` Blue Swirl
  0 siblings, 1 reply; 7+ messages in thread
From: Laurent Desnogues @ 2008-10-07  6:48 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 306 bytes --]

Here is a revised version that uses less memory while keeping
the same complexity.

This could be made less memory hungry by using some
binary tree data structure at the expense of code complexity.
I won't go down that road as anyway this stuff is mainly used
for debugging and tracing purposes.


Laurent

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: disass-ht2.patch --]
[-- Type: text/x-patch; name=disass-ht2.patch, Size: 3462 bytes --]

Index: disas.c
===================================================================
--- disas.c	(revision 5430)
+++ disas.c	(working copy)
@@ -300,9 +300,108 @@
     }
 }
 
+#if 1
+/* symbol hash table to speed up lookup_symbol */
+#define SYMINFO_HT_SIZE  3121u
+
+typedef struct syminfo_ht_entry_s syminfo_ht_entry_t;
+
+static unsigned int syminfo_ht_hash(target_ulong addr)
+{
+    return addr % SYMINFO_HT_SIZE;
+}
+
+struct syminfo_ht_entry_s {
+    target_ulong        addr;
+    const char         *name;
+    syminfo_ht_entry_t *next;
+};
+
+static syminfo_ht_entry_t *syminfo_ht[SYMINFO_HT_SIZE];
+
+static const syminfo_ht_entry_t *syminfo_ht_search(target_ulong addr)
+{
+    const syminfo_ht_entry_t *syminfo_ht_entry;
+
+    for (syminfo_ht_entry = syminfo_ht[syminfo_ht_hash(addr)];
+         syminfo_ht_entry;
+         syminfo_ht_entry = syminfo_ht_entry->next) {
+        if (addr == syminfo_ht_entry->addr)
+            return syminfo_ht_entry;
+    }
+    return NULL;
+}
+
+static const char *syminfo_ht_lookup(target_ulong addr)
+{
+    const syminfo_ht_entry_t *syminfo_ht_entry;
+
+    syminfo_ht_entry = syminfo_ht_search(addr);
+    if (syminfo_ht_entry)
+        return syminfo_ht_entry->name;
+    return "";
+}
+
+static void syminfo_ht_build(void)
+{
+    unsigned int i, key;
+    struct syminfo *s;
+    /* Hack, because we know this is x86. */
+    Elf32_Sym *sym;
+    target_ulong addr, j;
+    const syminfo_ht_entry_t *syminfo_ht_entry;
+    syminfo_ht_entry_t *new_entry;
+
+    for (s = syminfos; s; s = s->next) {
+        sym = s->disas_symtab;
+        for (i = 0; i < s->disas_num_syms; i++) {
+            if (sym[i].st_shndx == SHN_UNDEF
+                || sym[i].st_shndx >= SHN_LORESERVE)
+                continue;
+
+            if (ELF_ST_TYPE(sym[i].st_info) != STT_FUNC)
+                continue;
+
+            addr = sym[i].st_value;
+#if defined(TARGET_ARM) || defined (TARGET_MIPS)
+            /* The bottom address bit marks a Thumb or MIPS16 symbol.  */
+            addr &= ~(target_ulong)1;
+#endif
+            /* Enter [addr, addr + sym[i].st_size[ in the hash table. */
+            for (j = addr; j < addr + sym[i].st_size; j++) {
+                syminfo_ht_entry = syminfo_ht_search(j);
+                /* If the addr is already in, keep the first. */
+                if (!syminfo_ht_entry) {
+                    /* Add the symbol at the front of the list. */
+                    key = syminfo_ht_hash(j);
+                    new_entry = malloc(sizeof(syminfo_ht_entry_t));
+                    new_entry->addr = j;
+                    new_entry->name = s->disas_strtab + sym[i].st_name;
+                    new_entry->next = syminfo_ht[key];
+                    syminfo_ht[key] = new_entry;
+                }
+            }
+        }
+    }
+}
+
 /* Look up symbol for debugging purpose.  Returns "" if unknown. */
 const char *lookup_symbol(target_ulong orig_addr)
 {
+    static int ht_built = 0;
+
+    if (!ht_built) {
+        ht_built = 1;
+        syminfo_ht_build();
+    }
+    return syminfo_ht_lookup(orig_addr);
+}
+
+#else
+
+/* Look up symbol for debugging purpose.  Returns "" if unknown. */
+const char *lookup_symbol(target_ulong orig_addr)
+{
     unsigned int i;
     /* Hack, because we know this is x86. */
     Elf32_Sym *sym;
@@ -332,6 +431,8 @@
     return "";
 }
 
+#endif
+
 #if !defined(CONFIG_USER_ONLY)
 
 void term_vprintf(const char *fmt, va_list ap);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Hash table based symbol lookup (was: Re: [Qemu-devel] [PATCH] Fix symbol lookup for mips64* targets)
  2008-10-07  6:48     ` Laurent Desnogues
@ 2008-10-07 17:50       ` Blue Swirl
  2008-10-07 19:34         ` Laurent Desnogues
  0 siblings, 1 reply; 7+ messages in thread
From: Blue Swirl @ 2008-10-07 17:50 UTC (permalink / raw)
  To: qemu-devel

On 10/7/08, Laurent Desnogues <laurent.desnogues@gmail.com> wrote:
> Here is a revised version that uses less memory while keeping
>  the same complexity.
>
>  This could be made less memory hungry by using some
>  binary tree data structure at the expense of code complexity.
>  I won't go down that road as anyway this stuff is mainly used
>  for debugging and tracing purposes.

I agree that hash and binary tree would be too complex.

But what if you dropped the hash table and used a binary tree instead?
A binary tree would be both memory efficient (you could keep the
regions, maybe just use a pointer to the original symbol table) and
it's relatively fast. Maybe the original symbols could be sorted so
that the binary search could use the original table directly?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Hash table based symbol lookup (was: Re: [Qemu-devel] [PATCH] Fix symbol lookup for mips64* targets)
  2008-10-07 17:50       ` Blue Swirl
@ 2008-10-07 19:34         ` Laurent Desnogues
  2008-10-07 21:34           ` Laurent Desnogues
  0 siblings, 1 reply; 7+ messages in thread
From: Laurent Desnogues @ 2008-10-07 19:34 UTC (permalink / raw)
  To: qemu-devel

On Tue, Oct 7, 2008 at 7:50 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On 10/7/08, Laurent Desnogues <laurent.desnogues@gmail.com> wrote:
>>
>>  This could be made less memory hungry by using some
>>  binary tree data structure at the expense of code complexity.
>>  I won't go down that road as anyway this stuff is mainly used
>>  for debugging and tracing purposes.
>
> I agree that hash and binary tree would be too complex.
>
> But what if you dropped the hash table and used a binary tree instead?
> A binary tree would be both memory efficient (you could keep the
> regions, maybe just use a pointer to the original symbol table) and
> it's relatively fast. Maybe the original symbols could be sorted so
> that the binary search could use the original table directly?

I was indeed talking about replacing the whole hash table by a
binary tree (it has been proven that mixing both is not the way
to go;  cf Knuth).  The problem we have here is that we are
doing an interval search:  we are looking if an address belongs
to a function.

I will give that some thought, but don't hold your breath :-)


Laurent

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Hash table based symbol lookup (was: Re: [Qemu-devel] [PATCH] Fix symbol lookup for mips64* targets)
  2008-10-07 19:34         ` Laurent Desnogues
@ 2008-10-07 21:34           ` Laurent Desnogues
  0 siblings, 0 replies; 7+ messages in thread
From: Laurent Desnogues @ 2008-10-07 21:34 UTC (permalink / raw)
  To: qemu-devel

On Tue, Oct 7, 2008 at 9:34 PM, Laurent Desnogues
<laurent.desnogues@gmail.com> wrote:
> On Tue, Oct 7, 2008 at 7:50 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>> On 10/7/08, Laurent Desnogues <laurent.desnogues@gmail.com> wrote:
>>>
>>>  This could be made less memory hungry by using some
>>>  binary tree data structure at the expense of code complexity.
>>>  I won't go down that road as anyway this stuff is mainly used
>>>  for debugging and tracing purposes.
>>
>> I agree that hash and binary tree would be too complex.
>>
>> But what if you dropped the hash table and used a binary tree instead?
>> A binary tree would be both memory efficient (you could keep the
>> regions, maybe just use a pointer to the original symbol table) and
>> it's relatively fast. Maybe the original symbols could be sorted so
>> that the binary search could use the original table directly?
>
> I was indeed talking about replacing the whole hash table by a
> binary tree (it has been proven that mixing both is not the way
> to go;  cf Knuth).  The problem we have here is that we are
> doing an interval search:  we are looking if an address belongs
> to a function.
>
> I will give that some thought, but don't hold your breath :-)

It suddenly occurred to me a simple binary search could be a
good solution and should be easy to implement. The resulting
complexity and memory usage should be acceptable.  I will
give that a try with some huge executable.


Laurent

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-10-07 21:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-06 10:29 Hash table based symbol lookup (was: Re: [Qemu-devel] [PATCH] Fix symbol lookup for mips64* targets) Laurent Desnogues
2008-10-06 16:11 ` Blue Swirl
2008-10-06 16:28   ` Laurent Desnogues
2008-10-07  6:48     ` Laurent Desnogues
2008-10-07 17:50       ` Blue Swirl
2008-10-07 19:34         ` Laurent Desnogues
2008-10-07 21:34           ` Laurent Desnogues

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).