All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] codeparser: Use hashlib for hashing, not hash()
@ 2016-09-28 15:06 Armin Kuster
  2016-09-28 15:14 ` Richard Purdie
  0 siblings, 1 reply; 6+ messages in thread
From: Armin Kuster @ 2016-09-28 15:06 UTC (permalink / raw)
  To: akuster, bitbake-devel

From: Richard Purdie <richard.purdie@linuxfoundation.org>

"hash() is randomised by default each time you start a new instance of
recent
versions (Python3.3+) to prevent dictionary insertion DOS attacks"

which means we need to use hashlib.md5 to get consistent values for
the codeparser cache under python 3. Prior to this, the codeparser
cache was effectively useless under python3 as shown by performance
regressions.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 12d43cf45ba48e3587392f15315d92a1a53482ef)

We kept running into an issue where shell scripts were not
getting generated on 32bit hosts. It seemed to be caused by this code.

Example:
.../i686/build-project/tmp/work/cortexa5hf-vfp-neon-montavista-linux-gnueabi/libpthread-stubs/0.3-r0/temp/run.sysroot_stage_all.104712:
line 121: sysroot_stage_dir: command not found
WARNING:
..../i686/build-project/tmp/work/cortexa5hf-vfp-neon-montavista-linux-gnueabi/libpthread-stubs/0.3-r0/temp/run.sysroot_stage_all.104712:1
exit 127 from
  sysroot_stage_dir $from/usr/include $to/usr/include
DEBUG: Python function do_populate_sysroot finished
ERROR: Function failed: sysroot_stage_all (log file is located at
.../i686/build-project/tmp/work/cortexa5hf-vfp-neon-montavista-linux-gnueabi/libpthread-stubs/0.3-r0/temp/log.do_populate_sysroot.104712)

Signed-off-by: Armin Kuster <akuster@mvista.com>
---
 lib/bb/codeparser.py | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/bb/codeparser.py b/lib/bb/codeparser.py
index 82a3af4..15492e4 100644
--- a/lib/bb/codeparser.py
+++ b/lib/bb/codeparser.py
@@ -3,6 +3,7 @@ import codegen
 import logging
 import os.path
 import bb.utils, bb.data
+import hashlib
 from itertools import chain
 from pysh import pyshyacc, pyshlex, sherrors
 from bb.cache import MultiProcessCache
@@ -16,6 +17,8 @@ except ImportError:
     import pickle
     logger.info('Importing cPickle failed.  Falling back to a very slow implementation.')
 
+def bbhash(s):
+    return hashlib.md5(s.encode("utf-8")).hexdigest()
 
 def check_indent(codestr):
     """If the code is indented, add a top level piece of code to 'remove' the indentation"""
@@ -248,7 +251,7 @@ class PythonParser():
         if not node or not node.strip():
             return
 
-        h = hash(str(node))
+        h = bbhash(str(node))
 
         if h in codeparsercache.pythoncache:
             self.references = set(codeparsercache.pythoncache[h].refs)
@@ -291,7 +294,7 @@ class ShellParser():
         commands it executes.
         """
 
-        h = hash(str(value))
+        h = bbhash(str(value))
 
         if h in codeparsercache.shellcache:
             self.execs = set(codeparsercache.shellcache[h].execs)
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 6+ messages in thread
* [PATCH] codeparser: Use hashlib for hashing, not hash()
@ 2016-06-03 12:34 Richard Purdie
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Purdie @ 2016-06-03 12:34 UTC (permalink / raw)
  To: openembedded-core

"hash() is randomised by default each time you start a new instance of recent
versions (Python3.3+) to prevent dictionary insertion DOS attacks"

which means we need to use hashlib.md5 to get consistent values for
the codeparser cache under python 3. Prior to this, the codeparser
cache was effectively useless under python3 as shown by performance
regressions.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>

diff --git a/bitbake/lib/bb/codeparser.py b/bitbake/lib/bb/codeparser.py
index b1d067a..00551ba 100644
--- a/bitbake/lib/bb/codeparser.py
+++ b/bitbake/lib/bb/codeparser.py
@@ -6,12 +6,16 @@ import pickle
 import bb.pysh as pysh
 import os.path
 import bb.utils, bb.data
+import hashlib
 from itertools import chain
 from bb.pysh import pyshyacc, pyshlex, sherrors
 from bb.cache import MultiProcessCache
 
 logger = logging.getLogger('BitBake.CodeParser')
 
+def bbhash(s):
+    return hashlib.md5(s.encode("utf-8")).hexdigest()
+
 def check_indent(codestr):
     """If the code is indented, add a top level piece of code to 'remove' the indentation"""
 
@@ -269,7 +273,7 @@ class PythonParser():
         if not node or not node.strip():
             return
 
-        h = hash(str(node))
+        h = bbhash(str(node))
 
         if h in codeparsercache.pythoncache:
             self.references = set(codeparsercache.pythoncache[h].refs)
@@ -314,7 +318,7 @@ class ShellParser():
         commands it executes.
         """
 
-        h = hash(str(value))
+        h = bbhash(str(value))
 
         if h in codeparsercache.shellcache:
             self.execs = set(codeparsercache.shellcache[h].execs)




^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-09-28 19:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-28 15:06 [PATCH] codeparser: Use hashlib for hashing, not hash() Armin Kuster
2016-09-28 15:14 ` Richard Purdie
2016-09-28 15:25   ` akuster
2016-09-28 19:12     ` Christopher Larson
2016-09-28 19:18       ` akuster
  -- strict thread matches above, loose matches on Subject: below --
2016-06-03 12:34 Richard Purdie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.