Openembedded Core Discussions
 help / color / mirror / Atom feed
From: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
To: Joshua Watt <jpewhacker@gmail.com>,
	Jacob Kroon <jacob.kroon@gmail.com>,
	"openembedded-core@lists.openembedded.org"
	<openembedded-core@lists.openembedded.org>,
	"bitbake-devel@lists.openembedded.org"
	<bitbake-devel@lists.openembedded.org>
Subject: Re: [bitbake-devel] [PATCH v7 3/3] sstate: Implement hash equivalence sstate
Date: Fri, 11 Jan 2019 20:39:53 +0000	[thread overview]
Message-ID: <ae964dc787d041459073dd050d9978b1@XBOX04.axis.com> (raw)
In-Reply-To: <4cb097db5f1dc0cb1df7375cce6c03b31b34b3ff.camel@gmail.com>

> -----Original Message-----
> From: bitbake-devel-bounces@lists.openembedded.org <bitbake-devel-
> bounces@lists.openembedded.org> On Behalf Of Joshua Watt
> Sent: den 9 januari 2019 18:10
> To: Jacob Kroon <jacob.kroon@gmail.com>; openembedded-
> core@lists.openembedded.org; bitbake-devel@lists.openembedded.org
> Subject: Re: [bitbake-devel] [OE-core][PATCH v7 3/3] sstate: Implement
> hash equivalence sstate
> 
> On Tue, 2019-01-08 at 07:29 +0100, Jacob Kroon wrote:
> > On 1/4/19 5:20 PM, Joshua Watt wrote:
> > > Converts sstate so that it can use a hash equivalence server to
> > > determine if a task really needs to be rebuilt, or if it can be
> > > restored
> > > from a different (equivalent) sstate object.
> > >
> > > The unique hashes are cached persistently using persist_data. This
> > > has
> > > a number of advantages:
> > >   1) Unique hashes can be cached between invocations of bitbake to
> > >      prevent needing to contact the server every time (which is
> > > slow)
> > >   2) The value of each tasks unique hash can easily be synchronized
> > >      between different threads, which will be useful if bitbake is
> > >      updated to do on the fly task re-hashing.
> > >
> > > [YOCTO #13030]
> > >
> > > Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> > > ---
> > >   meta/classes/sstate.bbclass | 105 +++++++++++++++++++++--
> > >   meta/conf/bitbake.conf      |   4 +-
> > >   meta/lib/oe/sstatesig.py    | 167
> > > ++++++++++++++++++++++++++++++++++++
> > >   3 files changed, 267 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/meta/classes/sstate.bbclass
> > > b/meta/classes/sstate.bbclass
> > > index 59ebc3ab5cc..da0807d6e99 100644
> > > --- a/meta/classes/sstate.bbclass
> > > +++ b/meta/classes/sstate.bbclass
> > > @@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
> > >   SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
> > >   SSTATE_PKGSPEC    =
> > > "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-
> > > ${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
> > >   SSTATE_SWSPEC     =
> > > "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
> > > -SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.get
> > > Var('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
> > > +SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.get
> > > Var('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
> > >   SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
> > >   SSTATE_EXTRAPATH   = ""
> > >   SSTATE_EXTRAPATHWILDCARD = ""
> > > @@ -82,6 +82,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
> > >   # Whether to verify the GnUPG signatures when extracting sstate
> > > archives
> > >   SSTATE_VERIFY_SIG ?= "0"
> > >
> > > +SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
> > > +SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the
> > > output hash \
> > > +    for a task, which in turn is used to determine equivalency. \
> > > +    "
> > > +
> > > +SSTATE_HASHEQUIV_SERVER ?= ""
> > > +SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For
> > > example, \
> > > +    'http://192.168.0.1:5000'. Do not include a trailing slash \
> > > +    "
> > > +
> > > +SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
> > > +SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful
> > > data to the \
> > > +    hash equivalency server, such as PN, PV, taskname, etc. This
> > > information \
> > > +    is very useful for developers looking at task data, but may
> > > leak sensitive \
> > > +    data if the equivalence server is public. \
> > > +    "
> > > +
> > >   python () {
> > >       if bb.data.inherits_class('native', d):
> > >           d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
> > > @@ -640,7 +657,7 @@ def sstate_package(ss, d):
> > >           return
> > >
> > >       for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
> > > -             ['sstate_create_package', 'sstate_sign_package'] + \
> > > +             ['sstate_report_unihash', 'sstate_create_package',
> > > 'sstate_sign_package'] + \
> > >                (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
> > >           # All hooks should run in SSTATE_BUILDDIR.
> > >           bb.build.exec_func(f, d, (sstatebuild,))
> > > @@ -764,6 +781,73 @@ python sstate_sign_package () {
> > >                              d.getVar('SSTATE_SIG_PASSPHRASE'),
> > > armor=False)
> > >   }
> > >
> > > +def OEOuthashBasic(path, sigfile, task, d):
> > > +    import hashlib
> > > +    import stat
> > > +
> > > +    def update_hash(s):
> > > +        s = s.encode('utf-8')
> > > +        h.update(s)
> > > +        if sigfile:
> > > +            sigfile.write(s)
> > > +
> > > +    h = hashlib.sha256()
> > > +    prev_dir = os.getcwd()
> > > +
> > > +    try:
> > > +        os.chdir(path)
> > > +
> > > +        update_hash("OEOuthashBasic\n")
> > > +
> > > +        # It is only currently useful to get equivalent hashes for
> > > things that
> > > +        # can be restored from sstate. Since the sstate object is
> > > named using
> > > +        # SSTATE_PKGSPEC and the task name, those should be
> > > included in the
> > > +        # output hash calculation.
> > > +        update_hash("SSTATE_PKGSPEC=%s\n" %
> > > d.getVar('SSTATE_PKGSPEC'))
> > > +        update_hash("task=%s\n" % task)
> > > +
> > > +        for root, dirs, files in os.walk('.', topdown=True):
> > > +            # Sort directories and files to ensure consistent
> > > ordering
> > > +            dirs.sort()
> > > +            files.sort()
> > > +
> > > +            for f in files:
> > > +                path = os.path.join(root, f)
> > > +                s = os.lstat(path)
> > > +
> > > +                # Hash file path
> > > +                update_hash(path + '\n')
> > > +
> > > +                # Hash file mode
> > > +                update_hash("\tmode=0x%x\n" %
> > > stat.S_IMODE(s.st_mode))
> > > +                update_hash("\ttype=0x%x\n" %
> > > stat.S_IFMT(s.st_mode))
> > > +
> > > +                if stat.S_ISBLK(s.st_mode) or
> > > stat.S_ISBLK(s.st_mode):
> > > +                    # Hash device major and minor
> > > +                    update_hash("\tdev=%d,%d\n" %
> > > (os.major(s.st_rdev), os.minor(s.st_rdev)))
> > > +                elif stat.S_ISLNK(s.st_mode):
> > > +                    # Hash symbolic link
> > > +                    update_hash("\tsymlink=%s\n" %
> > > os.readlink(path))
> > > +                else:
> > > +                    fh = hashlib.sha256()
> > > +                    # Hash file contents
> > > +                    with open(path, 'rb') as d:
> > > +                        for chunk in iter(lambda: d.read(4096),
> > > b""):
> > > +                            fh.update(chunk)
> > > +                    update_hash("\tdigest=%s\n" % fh.hexdigest())
> >
> > Would it be a good idea to make the depsig.do_* files even more
> > human
> > readable, considering that they could be candidates for being stored
> > in
> > buildhistory ?
> >
> > As an example, here's what buildhistory/.../files-in-package.txt for
> > busybox looks like:
> >
> > drwxr-xr-x root       root             4096 ./bin
> > lrwxrwxrwx root       root               14 ./bin/busybox ->
> > busybox.nosuid
> > -rwxr-xr-x root       root           547292 ./bin/busybox.nosuid
> > -rwsr-xr-x root       root            50860 ./bin/busybox.suid
> > lrwxrwxrwx root       root               14 ./bin/sh ->
> > busybox.nosuid
> > drwxr-xr-x root       root             4096 ./etc
> > -rw-r--r-- root       root             2339
> > ./etc/busybox.links.nosuid
> > -rw-r--r-- root       root               91 ./etc/busybox.links.suid
> >
> 
> I went through the effort to try this, and I'm pretty happy with the
> results except for one important distinction: It's not reproducible in
> all cases because of the inclusion of the owner UID/GID (I used the
> decimal user and group IDs to prevent the dependency on the names).
> 
> For any task running under fakeroot (pesudo), this works like you would
> expect. However, for tasks not running under fakeroot (and possibly
> others that copy files from tasks not running under fakeroot?), the
> files are owned by the user that is running bitbake (e.g. You). This
> makes the output hashes not shareable between different developers.
> 
> I'm not sure what the best way to address this is; The UID and GID are
> an important part of the reproducibility and *should* be included in
> the output hash when relevant, but I don't know yet how to determine if
> they are relevant. I'm going to dig in and see if I can use "the
> current task is running under fakeroot" as that distinction. If anyone
> has any other ideas please chime in.

You should probably not rely on UID/GID to be stable for target. That 
is only the case if you have configured the build to use static IDs, 
otherwise they are dynamically assigned and may vary between builds. 
The user and group names should be stable though.

//Peter



  reply	other threads:[~2019-01-11 20:39 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16 20:37 [RFC 0/9] Hash Equivalency Server Joshua Watt
2018-07-16 20:37 ` [RFC 1/9] bitbake-worker: Pass taskhash as runtask parameter Joshua Watt
2018-07-16 20:37 ` [RFC 2/9] siggen: Split out stampfile hash fetch Joshua Watt
2018-07-16 20:37 ` [RFC 3/9] siggen: Split out task depend ID Joshua Watt
2018-07-16 20:37 ` [RFC 4/9] runqueue: Track task dependency ID Joshua Watt
2018-07-16 20:37 ` [RFC 5/9] runqueue: Pass dependency ID to task Joshua Watt
2018-07-16 20:37 ` [RFC 6/9] runqueue: Pass dependency ID to hash validate Joshua Watt
2018-07-16 20:37 ` [RFC 7/9] classes/sstate: Handle depid in hash check Joshua Watt
2018-07-16 20:37 ` [RFC 8/9] hashserver: Add initial reference server Joshua Watt
2018-07-17 12:11   ` [bitbake-devel] " Richard Purdie
2018-07-17 13:44     ` Joshua Watt
2018-07-18 13:53     ` Joshua Watt
2018-07-16 20:37 ` [RFC 9/9] sstate: Implement hash equivalence sstate Joshua Watt
2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
2018-08-09 22:08   ` [RFC v2 01/16] bitbake: fork: Add os.fork() wrappers Joshua Watt
2018-08-09 22:08   ` [RFC v2 02/16] bitbake: persist_data: Fix leaking cursors causing deadlock Joshua Watt
2018-08-09 22:08   ` [RFC v2 03/16] bitbake: persist_data: Add key constraints Joshua Watt
2018-08-09 22:08   ` [RFC v2 04/16] bitbake: persist_data: Enable Write Ahead Log Joshua Watt
2018-08-09 22:08   ` [RFC v2 05/16] bitbake: persist_data: Disable enable_shared_cache Joshua Watt
2018-08-09 22:08   ` [RFC v2 06/16] bitbake: persist_data: Close databases across fork Joshua Watt
2018-08-09 22:08   ` [RFC v2 07/16] bitbake: tests/persist_data: Add tests Joshua Watt
2018-08-09 22:08   ` [RFC v2 08/16] bitbake: bitbake-worker: Pass taskhash as runtask parameter Joshua Watt
2018-08-09 22:08   ` [RFC v2 09/16] bitbake: siggen: Split out stampfile hash fetch Joshua Watt
2018-08-09 22:08   ` [RFC v2 10/16] bitbake: siggen: Split out task depend ID Joshua Watt
2018-08-09 22:08   ` [RFC v2 11/16] bitbake: runqueue: Track task dependency ID Joshua Watt
2018-08-09 22:08   ` [RFC v2 12/16] bitbake: runqueue: Pass dependency ID to task Joshua Watt
2018-08-09 22:08   ` [RFC v2 13/16] bitbake: runqueue: Pass dependency ID to hash validate Joshua Watt
2018-08-09 22:08   ` [RFC v2 14/16] classes/sstate: Handle depid in hash check Joshua Watt
2018-08-09 22:08   ` [RFC v2 15/16] bitbake: hashserv: Add hash equivalence reference server Joshua Watt
2018-08-09 22:08   ` [RFC v2 16/16] sstate: Implement hash equivalence sstate Joshua Watt
2018-12-04  3:42   ` [PATCH v3 00/17] Hash Equivalency Server Joshua Watt
2018-12-04  3:42     ` [PATCH v3 01/17] bitbake: fork: Add os.fork() wrappers Joshua Watt
2018-12-04  3:42     ` [PATCH v3 02/17] bitbake: persist_data: Fix leaking cursors causing deadlock Joshua Watt
2018-12-04  3:42     ` [PATCH v3 03/17] bitbake: persist_data: Add key constraints Joshua Watt
2018-12-04  3:42     ` [PATCH v3 04/17] bitbake: persist_data: Enable Write Ahead Log Joshua Watt
2018-12-04  3:42     ` [PATCH v3 05/17] bitbake: persist_data: Disable enable_shared_cache Joshua Watt
2018-12-04  3:42     ` [PATCH v3 06/17] bitbake: persist_data: Close databases across fork Joshua Watt
2018-12-04  3:42     ` [PATCH v3 07/17] bitbake: tests/persist_data: Add tests Joshua Watt
2018-12-04  3:42     ` [PATCH v3 08/17] bitbake: bitbake-worker: Pass taskhash as runtask parameter Joshua Watt
2018-12-04  3:42     ` [PATCH v3 09/17] bitbake: siggen: Split out stampfile hash fetch Joshua Watt
2018-12-04  3:42     ` [PATCH v3 10/17] bitbake: siggen: Split out task depend ID Joshua Watt
2018-12-05 22:50       ` [bitbake-devel] " Richard Purdie
2018-12-06 14:58         ` Joshua Watt
2018-12-04  3:42     ` [PATCH v3 11/17] bitbake: runqueue: Track task dependency ID Joshua Watt
2018-12-04  3:42     ` [PATCH v3 12/17] bitbake: runqueue: Pass dependency ID to task Joshua Watt
2018-12-04  3:42     ` [PATCH v3 13/17] bitbake: runqueue: Pass dependency ID to hash validate Joshua Watt
2018-12-05 22:52       ` [bitbake-devel] " Richard Purdie
2018-12-04  3:42     ` [PATCH v3 14/17] classes/sstate: Handle depid in hash check Joshua Watt
2018-12-04  3:42     ` [PATCH v3 15/17] bitbake: hashserv: Add hash equivalence reference server Joshua Watt
2018-12-04  3:42     ` [PATCH v3 16/17] sstate: Implement hash equivalence sstate Joshua Watt
2018-12-04  3:42     ` [PATCH v3 17/17] classes/image-buildinfo: Remove unused argument Joshua Watt
2018-12-18 15:30     ` [PATCH v4 00/10] Hash Equivalency Server Joshua Watt
2018-12-18 15:30       ` [PATCH v4 01/10] bitbake: fork: Add os.fork() wrappers Joshua Watt
2018-12-18 15:30       ` [PATCH v4 02/10] bitbake: persist_data: Close databases across fork Joshua Watt
2018-12-18 15:30       ` [PATCH v4 03/10] bitbake: tests/persist_data: Add tests Joshua Watt
2018-12-18 15:30       ` [PATCH v4 04/10] bitbake: siggen: Split out task unique hash Joshua Watt
2018-12-18 15:30       ` [PATCH v4 05/10] bitbake: runqueue: Track " Joshua Watt
2018-12-18 15:30       ` [PATCH v4 06/10] bitbake: runqueue: Pass unique hash to task Joshua Watt
2018-12-18 15:30       ` [PATCH v4 07/10] bitbake: runqueue: Pass unique hash to hash validate Joshua Watt
2018-12-18 16:24         ` Richard Purdie
2018-12-18 16:31           ` Joshua Watt
2018-12-18 15:30       ` [PATCH v4 08/10] classes/sstate: Handle unihash in hash check Joshua Watt
2018-12-18 15:31       ` [PATCH v4 09/10] bitbake: hashserv: Add hash equivalence reference server Joshua Watt
2018-12-18 15:31       ` [PATCH v4 10/10] sstate: Implement hash equivalence sstate Joshua Watt
2018-12-19  3:10       ` [PATCH v5 0/8] Hash Equivalency Server Joshua Watt
2018-12-19  3:10         ` [PATCH v5 1/8] bitbake: tests/persist_data: Add tests Joshua Watt
2018-12-19  3:10         ` [PATCH v5 2/8] bitbake: siggen: Split out task unique hash Joshua Watt
2018-12-19  3:10         ` [PATCH v5 3/8] bitbake: runqueue: Track " Joshua Watt
2019-01-05  7:49           ` Alejandro Hernandez
2019-01-06  3:09             ` Joshua Watt
2019-01-07  6:52               ` Alejandro Hernandez
2019-01-07 16:16               ` akuster808
2019-01-07 16:40                 ` Joshua Watt
2018-12-19  3:10         ` [PATCH v5 4/8] bitbake: runqueue: Pass unique hash to task Joshua Watt
2018-12-19  3:10         ` [PATCH v5 5/8] bitbake: runqueue: Pass unique hash to hash validate Joshua Watt
2018-12-19  3:10         ` [PATCH v5 6/8] classes/sstate: Handle unihash in hash check Joshua Watt
2018-12-19  3:10         ` [PATCH v5 7/8] bitbake: hashserv: Add hash equivalence reference server Joshua Watt
2018-12-19  3:10         ` [PATCH v5 8/8] sstate: Implement hash equivalence sstate Joshua Watt
2018-12-19  3:33       ` ✗ patchtest: failure for Hash Equivalency Server (rev3) Patchwork
2019-01-04  2:42       ` [PATCH v6 0/3] Hash Equivalency Server Joshua Watt
2019-01-04  2:42         ` [PATCH v6 1/3] classes/sstate: Handle unihash in hash check Joshua Watt
2019-01-04  7:01           ` [bitbake-devel] " Richard Purdie
2019-01-04  2:42         ` [PATCH v6 2/3] bitbake: hashserv: Add hash equivalence reference server Joshua Watt
2019-01-04  2:42         ` [PATCH v6 3/3] sstate: Implement hash equivalence sstate Joshua Watt
2019-01-04 16:20         ` [PATCH v7 0/3] Hash Equivalency Server Joshua Watt
2019-01-04 16:20           ` [PATCH v7 1/3] classes/sstate: Handle unihash in hash check Joshua Watt
2019-01-04 16:20           ` [PATCH v7 2/3] bitbake: hashserv: Add hash equivalence reference server Joshua Watt
2019-01-04 16:20           ` [PATCH v7 3/3] sstate: Implement hash equivalence sstate Joshua Watt
2019-01-08  6:29             ` [bitbake-devel] " Jacob Kroon
2019-01-09 17:09               ` Joshua Watt
2019-01-11 20:39                 ` Peter Kjellerstedt [this message]
2019-01-04 16:33         ` ✗ patchtest: failure for Hash Equivalency Server (rev5) Patchwork
2019-01-04  3:03       ` ✗ patchtest: failure for Hash Equivalency Server (rev4) Patchwork
2018-12-18 16:03     ` ✗ patchtest: failure for Hash Equivalency Server (rev2) Patchwork
2018-12-04  4:05   ` ✗ patchtest: failure for Hash Equivalency Server Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae964dc787d041459073dd050d9978b1@XBOX04.axis.com \
    --to=peter.kjellerstedt@axis.com \
    --cc=bitbake-devel@lists.openembedded.org \
    --cc=jacob.kroon@gmail.com \
    --cc=jpewhacker@gmail.com \
    --cc=openembedded-core@lists.openembedded.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox