From: akuster808 <akuster808@gmail.com>
To: Ross Burton <ross.burton@intel.com>,
openembedded-core@lists.openembedded.org
Subject: Re: [PATCH 1/3] utils/md5_file: don't iterate line-by-line
Date: Mon, 13 Aug 2018 11:03:02 -0700 [thread overview]
Message-ID: <bb9584cf-513d-16de-7b99-b40e5a9b7622@gmail.com> (raw)
In-Reply-To: <20180813172054.17767-1-ross.burton@intel.com>
On 08/13/2018 10:20 AM, Ross Burton wrote:
> Opening a file in binary mode and iterating it seems like the simple solution
> but will still break on newlines, which for binary files isn't really useful as
> the size of the chunks could be huge or tiny.
>
> Instead, let's be a bit more clever: we'll be MD5ing lots of files, but we don't
> want to fill up memory: use mmap() to open the file and read the file in 8k
> blocks.
>
> Signed-off-by: Ross Burton <ross.burton@intel.com>
shouldn't this go to the bitbake mailing list ?
> ---
> bitbake/lib/bb/utils.py | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/bitbake/lib/bb/utils.py b/bitbake/lib/bb/utils.py
> index 9903183213b..b20cdabcf01 100644
> --- a/bitbake/lib/bb/utils.py
> +++ b/bitbake/lib/bb/utils.py
> @@ -524,12 +524,17 @@ def md5_file(filename):
> """
> Return the hex string representation of the MD5 checksum of filename.
> """
> - import hashlib
> - m = hashlib.md5()
> + import hashlib, mmap
>
> with open(filename, "rb") as f:
> - for line in f:
> - m.update(line)
> + m = hashlib.md5()
> + try:
> + with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
> + for chunk in iter(lambda: mm.read(8192), b''):
> + m.update(chunk)
> + except ValueError:
> + # You can't mmap() an empty file so silence this exception
> + pass
> return m.hexdigest()
>
> def sha256_file(filename):
next prev parent reply other threads:[~2018-08-13 18:03 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-13 17:20 [PATCH 1/3] utils/md5_file: don't iterate line-by-line Ross Burton
2018-08-13 17:20 ` [PATCH 2/3] checksum: sanity check path when recursively checksumming Ross Burton
2018-08-13 17:20 ` [PATCH 3/3] classes: sanity-check LIC_FILES_CHKSUM Ross Burton
2018-08-13 17:32 ` ✗ patchtest: failure for "utils/md5_file: don't iterate ..." and 2 more Patchwork
2018-08-13 18:03 ` akuster808 [this message]
2018-08-13 18:04 ` [PATCH 1/3] utils/md5_file: don't iterate line-by-line Burton, Ross
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bb9584cf-513d-16de-7b99-b40e5a9b7622@gmail.com \
--to=akuster808@gmail.com \
--cc=openembedded-core@lists.openembedded.org \
--cc=ross.burton@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox