* [PATCH] codeparser: Use hashlib for hashing, not hash()
@ 2016-06-03 12:34 Richard Purdie
0 siblings, 0 replies; 6+ messages in thread
From: Richard Purdie @ 2016-06-03 12:34 UTC (permalink / raw)
To: openembedded-core
"hash() is randomised by default each time you start a new instance of recent
versions (Python3.3+) to prevent dictionary insertion DOS attacks"
which means we need to use hashlib.md5 to get consistent values for
the codeparser cache under python 3. Prior to this, the codeparser
cache was effectively useless under python3 as shown by performance
regressions.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
diff --git a/bitbake/lib/bb/codeparser.py b/bitbake/lib/bb/codeparser.py
index b1d067a..00551ba 100644
--- a/bitbake/lib/bb/codeparser.py
+++ b/bitbake/lib/bb/codeparser.py
@@ -6,12 +6,16 @@ import pickle
import bb.pysh as pysh
import os.path
import bb.utils, bb.data
+import hashlib
from itertools import chain
from bb.pysh import pyshyacc, pyshlex, sherrors
from bb.cache import MultiProcessCache
logger = logging.getLogger('BitBake.CodeParser')
+def bbhash(s):
+ return hashlib.md5(s.encode("utf-8")).hexdigest()
+
def check_indent(codestr):
"""If the code is indented, add a top level piece of code to 'remove' the indentation"""
@@ -269,7 +273,7 @@ class PythonParser():
if not node or not node.strip():
return
- h = hash(str(node))
+ h = bbhash(str(node))
if h in codeparsercache.pythoncache:
self.references = set(codeparsercache.pythoncache[h].refs)
@@ -314,7 +318,7 @@ class ShellParser():
commands it executes.
"""
- h = hash(str(value))
+ h = bbhash(str(value))
if h in codeparsercache.shellcache:
self.execs = set(codeparsercache.shellcache[h].execs)
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH] codeparser: Use hashlib for hashing, not hash()
@ 2016-09-28 15:06 Armin Kuster
2016-09-28 15:14 ` Richard Purdie
0 siblings, 1 reply; 6+ messages in thread
From: Armin Kuster @ 2016-09-28 15:06 UTC (permalink / raw)
To: akuster, bitbake-devel
From: Richard Purdie <richard.purdie@linuxfoundation.org>
"hash() is randomised by default each time you start a new instance of
recent
versions (Python3.3+) to prevent dictionary insertion DOS attacks"
which means we need to use hashlib.md5 to get consistent values for
the codeparser cache under python 3. Prior to this, the codeparser
cache was effectively useless under python3 as shown by performance
regressions.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 12d43cf45ba48e3587392f15315d92a1a53482ef)
We kept running into an issue where shell scripts were not
getting generated on 32bit hosts. It seemed to be caused by this code.
Example:
.../i686/build-project/tmp/work/cortexa5hf-vfp-neon-montavista-linux-gnueabi/libpthread-stubs/0.3-r0/temp/run.sysroot_stage_all.104712:
line 121: sysroot_stage_dir: command not found
WARNING:
..../i686/build-project/tmp/work/cortexa5hf-vfp-neon-montavista-linux-gnueabi/libpthread-stubs/0.3-r0/temp/run.sysroot_stage_all.104712:1
exit 127 from
sysroot_stage_dir $from/usr/include $to/usr/include
DEBUG: Python function do_populate_sysroot finished
ERROR: Function failed: sysroot_stage_all (log file is located at
.../i686/build-project/tmp/work/cortexa5hf-vfp-neon-montavista-linux-gnueabi/libpthread-stubs/0.3-r0/temp/log.do_populate_sysroot.104712)
Signed-off-by: Armin Kuster <akuster@mvista.com>
---
lib/bb/codeparser.py | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/lib/bb/codeparser.py b/lib/bb/codeparser.py
index 82a3af4..15492e4 100644
--- a/lib/bb/codeparser.py
+++ b/lib/bb/codeparser.py
@@ -3,6 +3,7 @@ import codegen
import logging
import os.path
import bb.utils, bb.data
+import hashlib
from itertools import chain
from pysh import pyshyacc, pyshlex, sherrors
from bb.cache import MultiProcessCache
@@ -16,6 +17,8 @@ except ImportError:
import pickle
logger.info('Importing cPickle failed. Falling back to a very slow implementation.')
+def bbhash(s):
+ return hashlib.md5(s.encode("utf-8")).hexdigest()
def check_indent(codestr):
"""If the code is indented, add a top level piece of code to 'remove' the indentation"""
@@ -248,7 +251,7 @@ class PythonParser():
if not node or not node.strip():
return
- h = hash(str(node))
+ h = bbhash(str(node))
if h in codeparsercache.pythoncache:
self.references = set(codeparsercache.pythoncache[h].refs)
@@ -291,7 +294,7 @@ class ShellParser():
commands it executes.
"""
- h = hash(str(value))
+ h = bbhash(str(value))
if h in codeparsercache.shellcache:
self.execs = set(codeparsercache.shellcache[h].execs)
--
2.7.4
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] codeparser: Use hashlib for hashing, not hash()
2016-09-28 15:06 Armin Kuster
@ 2016-09-28 15:14 ` Richard Purdie
2016-09-28 15:25 ` akuster
0 siblings, 1 reply; 6+ messages in thread
From: Richard Purdie @ 2016-09-28 15:14 UTC (permalink / raw)
To: Armin Kuster, akuster, bitbake-devel
On Wed, 2016-09-28 at 08:06 -0700, Armin Kuster wrote:
> From: Richard Purdie <richard.purdie@linuxfoundation.org>
>
> "hash() is randomised by default each time you start a new instance
> of
> recent
> versions (Python3.3+) to prevent dictionary insertion DOS attacks"
>
> which means we need to use hashlib.md5 to get consistent values for
> the codeparser cache under python 3. Prior to this, the codeparser
> cache was effectively useless under python3 as shown by performance
> regressions.
>
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> (cherry picked from commit 12d43cf45ba48e3587392f15315d92a1a53482ef)
>
> We kept running into an issue where shell scripts were not
> getting generated on 32bit hosts. It seemed to be caused by this
> code.
I'm puzzled. This patch is in master. I'm therefore assuming you're
suggesting this for 1.30 but it doesn't say that anywhere.
1.30 isn't python3 based and therefore doesn't have the problem
described above?
Cheers,
Richard
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] codeparser: Use hashlib for hashing, not hash()
2016-09-28 15:14 ` Richard Purdie
@ 2016-09-28 15:25 ` akuster
2016-09-28 19:12 ` Christopher Larson
0 siblings, 1 reply; 6+ messages in thread
From: akuster @ 2016-09-28 15:25 UTC (permalink / raw)
To: Richard Purdie, Armin Kuster, bitbake-devel
On 09/28/2016 08:14 AM, Richard Purdie wrote:
> On Wed, 2016-09-28 at 08:06 -0700, Armin Kuster wrote:
>> From: Richard Purdie <richard.purdie@linuxfoundation.org>
>>
>> "hash() is randomised by default each time you start a new instance
>> of
>> recent
>> versions (Python3.3+) to prevent dictionary insertion DOS attacks"
>>
>> which means we need to use hashlib.md5 to get consistent values for
>> the codeparser cache under python 3. Prior to this, the codeparser
>> cache was effectively useless under python3 as shown by performance
>> regressions.
>>
>> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
>> (cherry picked from commit 12d43cf45ba48e3587392f15315d92a1a53482ef)
>>
>> We kept running into an issue where shell scripts were not
>> getting generated on 32bit hosts. It seemed to be caused by this
>> code.
> I'm puzzled. This patch is in master. I'm therefore assuming you're
> suggesting this for 1.30 but it doesn't say that anywhere.
We made a similar change in our bitbake sources for 1.28 which needed to
be submitted per the Yocto compatible requirements. I could have sent
that one but then you might have said "Hey, this looks like the one in
Master, why are you not using that one?". I personally like to
reference upstream official fixes than hand crafted ones that achieve
the same result. I can send you the one we did if you want.
>
> 1.30 isn't python3 based and therefore doesn't have the problem
> described above?
This fixed an issue we found building on 32bit hosts.
- Armin
>
> Cheers,
>
> Richard
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] codeparser: Use hashlib for hashing, not hash()
2016-09-28 15:25 ` akuster
@ 2016-09-28 19:12 ` Christopher Larson
2016-09-28 19:18 ` akuster
0 siblings, 1 reply; 6+ messages in thread
From: Christopher Larson @ 2016-09-28 19:12 UTC (permalink / raw)
To: akuster; +Cc: bitbake-devel@lists.openembedded.org, Armin Kuster
[-- Attachment #1: Type: text/plain, Size: 1840 bytes --]
On Wed, Sep 28, 2016 at 8:25 AM, akuster <akuster@mvista.com> wrote:
> On 09/28/2016 08:14 AM, Richard Purdie wrote:
>
>> On Wed, 2016-09-28 at 08:06 -0700, Armin Kuster wrote:
>>
>>> From: Richard Purdie <richard.purdie@linuxfoundation.org>
>>>
>>> "hash() is randomised by default each time you start a new instance
>>> of
>>> recent
>>> versions (Python3.3+) to prevent dictionary insertion DOS attacks"
>>>
>>> which means we need to use hashlib.md5 to get consistent values for
>>> the codeparser cache under python 3. Prior to this, the codeparser
>>> cache was effectively useless under python3 as shown by performance
>>> regressions.
>>>
>>> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
>>> (cherry picked from commit 12d43cf45ba48e3587392f15315d92a1a53482ef)
>>>
>>> We kept running into an issue where shell scripts were not
>>> getting generated on 32bit hosts. It seemed to be caused by this
>>> code.
>>>
>> I'm puzzled. This patch is in master. I'm therefore assuming you're
>> suggesting this for 1.30 but it doesn't say that anywhere.
>>
>
> We made a similar change in our bitbake sources for 1.28 which needed to
> be submitted per the Yocto compatible requirements. I could have sent
> that one but then you might have said "Hey, this looks like the one in
> Master, why are you not using that one?". I personally like to reference
> upstream official fixes than hand crafted ones that achieve the same
> result. I can send you the one we did if you want.
I think the problem is you didn’t actually say what version/branch you
wanted it applied to, not which commit the patch references..
--
Christopher Larson
clarson at kergoth dot com
Founder - BitBake, OpenEmbedded, OpenZaurus
Maintainer - Tslib
Senior Software Engineer, Mentor Graphics
[-- Attachment #2: Type: text/html, Size: 2656 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] codeparser: Use hashlib for hashing, not hash()
2016-09-28 19:12 ` Christopher Larson
@ 2016-09-28 19:18 ` akuster
0 siblings, 0 replies; 6+ messages in thread
From: akuster @ 2016-09-28 19:18 UTC (permalink / raw)
To: Christopher Larson; +Cc: bitbake-devel@lists.openembedded.org, Armin Kuster
[-- Attachment #1: Type: text/plain, Size: 2494 bytes --]
On 09/28/2016 12:12 PM, Christopher Larson wrote:
>
> On Wed, Sep 28, 2016 at 8:25 AM, akuster <akuster@mvista.com
> <mailto:akuster@mvista.com>> wrote:
>
> On 09/28/2016 08:14 AM, Richard Purdie wrote:
>
> On Wed, 2016-09-28 at 08:06 -0700, Armin Kuster wrote:
>
> From: Richard Purdie <richard.purdie@linuxfoundation.org
> <mailto:richard.purdie@linuxfoundation.org>>
>
> "hash() is randomised by default each time you start a new
> instance
> of
> recent
> versions (Python3.3+) to prevent dictionary insertion DOS
> attacks"
>
> which means we need to use hashlib.md5 to get consistent
> values for
> the codeparser cache under python 3. Prior to this, the
> codeparser
> cache was effectively useless under python3 as shown by
> performance
> regressions.
>
> Signed-off-by: Richard Purdie
> <richard.purdie@linuxfoundation.org
> <mailto:richard.purdie@linuxfoundation.org>>
> (cherry picked from commit
> 12d43cf45ba48e3587392f15315d92a1a53482ef)
>
> We kept running into an issue where shell scripts were not
> getting generated on 32bit hosts. It seemed to be caused
> by this
> code.
>
> I'm puzzled. This patch is in master. I'm therefore assuming
> you're
> suggesting this for 1.30 but it doesn't say that anywhere.
>
>
> We made a similar change in our bitbake sources for 1.28 which
> needed to be submitted per the Yocto compatible requirements. I
> could have sent that one but then you might have said "Hey, this
> looks like the one in Master, why are you not using that one?".
> I personally like to reference upstream official fixes than hand
> crafted ones that achieve the same result. I can send you the one
> we did if you want.
>
>
> I think the problem is you didn’t actually say what version/branch you
> wanted it applied to, not which commit the patch references..
oh.. I did didn't I . that was lame. apologies.
I can resent to add version to the subject.
-armin
> --
> Christopher Larson
> clarson at kergoth dot com
> Founder - BitBake, OpenEmbedded, OpenZaurus
> Maintainer - Tslib
> Senior Software Engineer, Mentor Graphics
[-- Attachment #2: Type: text/html, Size: 4776 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-09-28 19:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-03 12:34 [PATCH] codeparser: Use hashlib for hashing, not hash() Richard Purdie
-- strict thread matches above, loose matches on Subject: below --
2016-09-28 15:06 Armin Kuster
2016-09-28 15:14 ` Richard Purdie
2016-09-28 15:25 ` akuster
2016-09-28 19:12 ` Christopher Larson
2016-09-28 19:18 ` akuster
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.