[PATCH] codeparser: Call intern over the set contents for better cache performance

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Richard Purdie <richard.purdie@linuxfoundation.org>
To: bitbake-devel <bitbake-devel@lists.openembedded.org>
Subject: [PATCH] codeparser: Call intern over the set contents for better cache performance
Date: Sun, 11 Mar 2012 14:36:44 +0000	[thread overview]
Message-ID: <1331476604.15192.2.camel@ted> (raw)

See the comment in the code in the commit for more information.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
---
 lib/bb/codeparser.py |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/lib/bb/codeparser.py b/lib/bb/codeparser.py
index 04a34f9..af2e194 100644
--- a/lib/bb/codeparser.py
+++ b/lib/bb/codeparser.py
@@ -98,6 +98,12 @@ def parser_cache_save(d):
     bb.utils.unlockfile(lf)
     bb.utils.unlockfile(glf)
 
+def internSet(items):
+    new = set()
+    for i in items:
+        new.add(intern(i))
+    return new
+
 def parser_cache_savemerge(d):
     cachefile = parser_cachefile(d)
     if not cachefile:
@@ -133,6 +139,21 @@ def parser_cache_savemerge(d):
                 data[1][h] = extradata[1][h]
         os.unlink(f)
 
+    # When the dicts are originally created, python calls intern() on the set keys
+    # which significantly improves memory usage. Sadly the pickle/unpickle process 
+    # doesn't call intern() on the keys and results in the same strings being duplicated
+    # in memory. This also means pickle will save the same string multiple times in 
+    # the cache file. By interning the data here, the cache file shrinks dramatically
+    # meaning faster load times and the reloaded cache files also consume much less 
+    # memory. This is worth any performance hit from this loops and the use of the 
+    # intern() data storage.
+    # Python 3.x may behave better in this area
+    for h in data[0]:
+        data[0][h]["refs"] = internSet(data[0][h]["refs"])
+        data[0][h]["execs"] = internSet(data[0][h]["execs"])
+    for h in data[1]:
+        data[1][h]["execs"] = internSet(data[1][h]["execs"])
+
     p = pickle.Pickler(file(cachefile, "wb"), -1)
     p.dump([data, PARSERCACHE_VERSION])

                 reply	other threads:[~2012-03-11 14:45 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:04a34f9 dfblob:af2e194 )
 OR (
bs:"[PATCH] codeparser: Call intern over the set contents for better cache performance" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1331476604.15192.2.camel@ted \
    --to=richard.purdie@linuxfoundation.org \
    --cc=bitbake-devel@lists.openembedded.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.