Openembedded Core Discussions
 help / color / mirror / Atom feed
* [PATCH 1/3] utils/md5_file: don't iterate line-by-line
@ 2018-08-13 17:20 Ross Burton
  2018-08-13 17:20 ` [PATCH 2/3] checksum: sanity check path when recursively checksumming Ross Burton
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Ross Burton @ 2018-08-13 17:20 UTC (permalink / raw)
  To: openembedded-core

Opening a file in binary mode and iterating it seems like the simple solution
but will still break on newlines, which for binary files isn't really useful as
the size of the chunks could be huge or tiny.

Instead, let's be a bit more clever: we'll be MD5ing lots of files, but we don't
want to fill up memory: use mmap() to open the file and read the file in 8k
blocks.

Signed-off-by: Ross Burton <ross.burton@intel.com>
---
 bitbake/lib/bb/utils.py | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/bitbake/lib/bb/utils.py b/bitbake/lib/bb/utils.py
index 9903183213b..b20cdabcf01 100644
--- a/bitbake/lib/bb/utils.py
+++ b/bitbake/lib/bb/utils.py
@@ -524,12 +524,17 @@ def md5_file(filename):
     """
     Return the hex string representation of the MD5 checksum of filename.
     """
-    import hashlib
-    m = hashlib.md5()
+    import hashlib, mmap
 
     with open(filename, "rb") as f:
-        for line in f:
-            m.update(line)
+        m = hashlib.md5()
+        try:
+            with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
+                for chunk in iter(lambda: mm.read(8192), b''):
+                    m.update(chunk)
+        except ValueError:
+            # You can't mmap() an empty file so silence this exception
+            pass
     return m.hexdigest()
 
 def sha256_file(filename):
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-08-13 18:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-13 17:20 [PATCH 1/3] utils/md5_file: don't iterate line-by-line Ross Burton
2018-08-13 17:20 ` [PATCH 2/3] checksum: sanity check path when recursively checksumming Ross Burton
2018-08-13 17:20 ` [PATCH 3/3] classes: sanity-check LIC_FILES_CHKSUM Ross Burton
2018-08-13 17:32 ` ✗ patchtest: failure for "utils/md5_file: don't iterate ..." and 2 more Patchwork
2018-08-13 18:03 ` [PATCH 1/3] utils/md5_file: don't iterate line-by-line akuster808
2018-08-13 18:04   ` Burton, Ross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox