All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] codeparser: Call intern over the set contents for better cache performance
@ 2012-03-11 14:36 Richard Purdie
  0 siblings, 0 replies; only message in thread
From: Richard Purdie @ 2012-03-11 14:36 UTC (permalink / raw)
  To: bitbake-devel

See the comment in the code in the commit for more information.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
---
 lib/bb/codeparser.py |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/lib/bb/codeparser.py b/lib/bb/codeparser.py
index 04a34f9..af2e194 100644
--- a/lib/bb/codeparser.py
+++ b/lib/bb/codeparser.py
@@ -98,6 +98,12 @@ def parser_cache_save(d):
     bb.utils.unlockfile(lf)
     bb.utils.unlockfile(glf)
 
+def internSet(items):
+    new = set()
+    for i in items:
+        new.add(intern(i))
+    return new
+
 def parser_cache_savemerge(d):
     cachefile = parser_cachefile(d)
     if not cachefile:
@@ -133,6 +139,21 @@ def parser_cache_savemerge(d):
                 data[1][h] = extradata[1][h]
         os.unlink(f)
 
+    # When the dicts are originally created, python calls intern() on the set keys
+    # which significantly improves memory usage. Sadly the pickle/unpickle process 
+    # doesn't call intern() on the keys and results in the same strings being duplicated
+    # in memory. This also means pickle will save the same string multiple times in 
+    # the cache file. By interning the data here, the cache file shrinks dramatically
+    # meaning faster load times and the reloaded cache files also consume much less 
+    # memory. This is worth any performance hit from this loops and the use of the 
+    # intern() data storage.
+    # Python 3.x may behave better in this area
+    for h in data[0]:
+        data[0][h]["refs"] = internSet(data[0][h]["refs"])
+        data[0][h]["execs"] = internSet(data[0][h]["execs"])
+    for h in data[1]:
+        data[1][h]["execs"] = internSet(data[1][h]["execs"])
+
     p = pickle.Pickler(file(cachefile, "wb"), -1)
     p.dump([data, PARSERCACHE_VERSION])
 





^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2012-03-11 14:45 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-11 14:36 [PATCH] codeparser: Call intern over the set contents for better cache performance Richard Purdie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.