All of lore.kernel.org
 help / color / mirror / Atom feed
* [OE-core] [PATCH] codeparser: Use hashlib for hashing, not hash()
@ 2016-06-03 12:38 Richard Purdie
  2016-06-03 18:34 ` Christopher Larson
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Purdie @ 2016-06-03 12:38 UTC (permalink / raw)
  To: bitbake-devel

"hash() is randomised by default each time you start a new instance of
recent
versions (Python3.3+) to prevent dictionary insertion DOS attacks"

which means we need to use hashlib.md5 to get consistent values for
the codeparser cache under python 3. Prior to this, the codeparser
cache was effectively useless under python3 as shown by performance
regressions.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>

diff --git a/bitbake/lib/bb/codeparser.py
b/bitbake/lib/bb/codeparser.py
index b1d067a..00551ba 100644
--- a/bitbake/lib/bb/codeparser.py
+++ b/bitbake/lib/bb/codeparser.py
@@ -6,12 +6,16 @@ import pickle
 import bb.pysh as pysh
 import os.path
 import bb.utils, bb.data
+import hashlib
 from itertools import chain
 from bb.pysh import pyshyacc, pyshlex, sherrors
 from bb.cache import MultiProcessCache
 
 logger = logging.getLogger('BitBake.CodeParser')
 
+def bbhash(s):
+    return hashlib.md5(s.encode("utf-8")).hexdigest()
+
 def check_indent(codestr):
     """If the code is indented, add a top level piece of code to
'remove' the indentation"""
 
@@ -269,7 +273,7 @@ class PythonParser():
         if not node or not node.strip():
             return
 
-        h = hash(str(node))
+        h = bbhash(str(node))
 
         if h in codeparsercache.pythoncache:
             self.references = set(codeparsercache.pythoncache[h].refs)
@@ -314,7 +318,7 @@ class ShellParser():
         commands it executes.
         """
 
-        h = hash(str(value))
+        h = bbhash(str(value))
 
         if h in codeparsercache.shellcache:
             self.execs = set(codeparsercache.shellcache[h].execs)




^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [OE-core] [PATCH] codeparser: Use hashlib for hashing, not hash()
  2016-06-03 12:38 [OE-core] [PATCH] codeparser: Use hashlib for hashing, not hash() Richard Purdie
@ 2016-06-03 18:34 ` Christopher Larson
  2016-06-03 21:14   ` Richard Purdie
  0 siblings, 1 reply; 3+ messages in thread
From: Christopher Larson @ 2016-06-03 18:34 UTC (permalink / raw)
  To: Richard Purdie; +Cc: bitbake-devel

[-- Attachment #1: Type: text/plain, Size: 874 bytes --]

On Fri, Jun 3, 2016 at 5:38 AM, Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:

> "hash() is randomised by default each time you start a new instance of
> recent
> versions (Python3.3+) to prevent dictionary insertion DOS attacks"
>
> which means we need to use hashlib.md5 to get consistent values for
> the codeparser cache under python 3. Prior to this, the codeparser
> cache was effectively useless under python3 as shown by performance
> regressions.
>
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
>

Looks good to me, I was actually wondering about the hashing mechanisms
just yesterday. Are the sstate checksums stable between python 2 and python
3?
-- 
Christopher Larson
clarson at kergoth dot com
Founder - BitBake, OpenEmbedded, OpenZaurus
Maintainer - Tslib
Senior Software Engineer, Mentor Graphics

[-- Attachment #2: Type: text/html, Size: 1362 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [OE-core] [PATCH] codeparser: Use hashlib for hashing, not hash()
  2016-06-03 18:34 ` Christopher Larson
@ 2016-06-03 21:14   ` Richard Purdie
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Purdie @ 2016-06-03 21:14 UTC (permalink / raw)
  To: Christopher Larson; +Cc: bitbake-devel

On Fri, 2016-06-03 at 11:34 -0700, Christopher Larson wrote:
> 
> On Fri, Jun 3, 2016 at 5:38 AM, Richard Purdie <
> richard.purdie@linuxfoundation.org> wrote:
> > "hash() is randomised by default each time you start a new instance
> > of
> > recent
> > versions (Python3.3+) to prevent dictionary insertion DOS attacks"
> > 
> > which means we need to use hashlib.md5 to get consistent values for
> > the codeparser cache under python 3. Prior to this, the codeparser
> > cache was effectively useless under python3 as shown by performance
> > regressions.
> > 
> > Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> > 
> Looks good to me, I was actually wondering about the hashing
> mechanisms just yesterday. Are the sstate checksums stable between
> python 2 and python 3?

I've not actually checked, we didn't intentionally break anything. It
doesn't actually matter that much since the python code between the two
is different and hence the checksums will change...

Cheers,

Richard



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-06-03 21:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-03 12:38 [OE-core] [PATCH] codeparser: Use hashlib for hashing, not hash() Richard Purdie
2016-06-03 18:34 ` Christopher Larson
2016-06-03 21:14   ` Richard Purdie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.