Openembedded Core Discussions
 help / color / mirror / Atom feed
From: akuster808 <akuster808@gmail.com>
To: Ross Burton <ross.burton@intel.com>,
	openembedded-core@lists.openembedded.org
Subject: Re: [PATCH 1/3] utils/md5_file: don't iterate line-by-line
Date: Mon, 13 Aug 2018 11:03:02 -0700	[thread overview]
Message-ID: <bb9584cf-513d-16de-7b99-b40e5a9b7622@gmail.com> (raw)
In-Reply-To: <20180813172054.17767-1-ross.burton@intel.com>



On 08/13/2018 10:20 AM, Ross Burton wrote:
> Opening a file in binary mode and iterating it seems like the simple solution
> but will still break on newlines, which for binary files isn't really useful as
> the size of the chunks could be huge or tiny.
>
> Instead, let's be a bit more clever: we'll be MD5ing lots of files, but we don't
> want to fill up memory: use mmap() to open the file and read the file in 8k
> blocks.
>
> Signed-off-by: Ross Burton <ross.burton@intel.com>

shouldn't this go to the bitbake mailing list ?
> ---
>  bitbake/lib/bb/utils.py | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/bitbake/lib/bb/utils.py b/bitbake/lib/bb/utils.py
> index 9903183213b..b20cdabcf01 100644
> --- a/bitbake/lib/bb/utils.py
> +++ b/bitbake/lib/bb/utils.py
> @@ -524,12 +524,17 @@ def md5_file(filename):
>      """
>      Return the hex string representation of the MD5 checksum of filename.
>      """
> -    import hashlib
> -    m = hashlib.md5()
> +    import hashlib, mmap
>  
>      with open(filename, "rb") as f:
> -        for line in f:
> -            m.update(line)
> +        m = hashlib.md5()
> +        try:
> +            with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
> +                for chunk in iter(lambda: mm.read(8192), b''):
> +                    m.update(chunk)
> +        except ValueError:
> +            # You can't mmap() an empty file so silence this exception
> +            pass
>      return m.hexdigest()
>  
>  def sha256_file(filename):



  parent reply	other threads:[~2018-08-13 18:03 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-13 17:20 [PATCH 1/3] utils/md5_file: don't iterate line-by-line Ross Burton
2018-08-13 17:20 ` [PATCH 2/3] checksum: sanity check path when recursively checksumming Ross Burton
2018-08-13 17:20 ` [PATCH 3/3] classes: sanity-check LIC_FILES_CHKSUM Ross Burton
2018-08-13 17:32 ` ✗ patchtest: failure for "utils/md5_file: don't iterate ..." and 2 more Patchwork
2018-08-13 18:03 ` akuster808 [this message]
2018-08-13 18:04   ` [PATCH 1/3] utils/md5_file: don't iterate line-by-line Burton, Ross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb9584cf-513d-16de-7b99-b40e5a9b7622@gmail.com \
    --to=akuster808@gmail.com \
    --cc=openembedded-core@lists.openembedded.org \
    --cc=ross.burton@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox