* [RFC PATCH 0/3] WIP: add ability to reduce package feed churn
@ 2015-09-08 13:41 Paul Eggleton
2015-09-08 13:41 ` [RFC PATCH 1/3] classes/sstate: break out function to get sstate manifest filename Paul Eggleton
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Paul Eggleton @ 2015-09-08 13:41 UTC (permalink / raw)
To: openembedded-core
Based on some earlier thinking / work done by Richard Purdie and Randy Witt
I've been looking into how we could avoid newly versioned but otherwise
functionally identical packages built solely because of changes in
recipe dependencies entering the package feed (thus triggering unnecessary
upgrades on target). I've come up with a class that you can enable that
makes use of build-compare to compare the new package to one that was
previously generated and skip adding it to the feed if it hasn't
materially changed.
I suspect we may be able to go further than this in future so this isn't
necessarily the end game in terms of reducing the impact of signature
changes, but it's a start. More concretely I am a bit concerned that
build-compare itself needs further work, I am seeing errors in the task
logs that aren't causing the task to fail, probably the script needs
set -e at minimum (and the errors themselves need fixing). I wanted to
give people a bit of a preview of this though as it does have the
potential to be quite useful assuming the approach makes sense.
Please review the following changes for suitability for inclusion. If you have
any objections or suggestions for improvement, please respond to the patches.
The following changes since commit 8402958cd2cb87b8283c8ee4e2d08e1a6717d67a:
pseudo_1.7.3.bb: New version of pseudo (2015-09-06 15:24:28 +0100)
are available in the git repository at:
git://git.openembedded.org/openembedded-core-contrib paule/packagefeed-stability
http://cgit.openembedded.org/cgit.cgi/openembedded-core-contrib/log/?h=paule/packagefeed-stability
Paul Eggleton (3):
classes/sstate: break out function to get sstate manifest filename
build-compare: add support for examining deb and ipk packages
WIP: classes/packagefeed-stability: add class to help reduce package
feed churn
meta/classes/packagefeed-stability.bbclass | 270 +++++++++++++++++++++
meta/classes/sstate.bbclass | 7 +-
meta/lib/oe/sstatesig.py | 12 +
.../build-compare/build-compare_git.bb | 8 +-
...001-Add-support-for-deb-and-ipk-packaging.patch | 64 +++++
5 files changed, 353 insertions(+), 8 deletions(-)
create mode 100644 meta/classes/packagefeed-stability.bbclass
create mode 100644 meta/recipes-devtools/build-compare/files/0001-Add-support-for-deb-and-ipk-packaging.patch
--
2.1.0
^ permalink raw reply [flat|nested] 7+ messages in thread* [RFC PATCH 1/3] classes/sstate: break out function to get sstate manifest filename 2015-09-08 13:41 [RFC PATCH 0/3] WIP: add ability to reduce package feed churn Paul Eggleton @ 2015-09-08 13:41 ` Paul Eggleton 2015-09-08 13:41 ` [RFC PATCH 2/3] build-compare: add support for examining deb and ipk packages Paul Eggleton 2015-09-08 13:41 ` [RFC PATCH 3/3] WIP: classes/packagefeed-stability: add class to help reduce package feed churn Paul Eggleton 2 siblings, 0 replies; 7+ messages in thread From: Paul Eggleton @ 2015-09-08 13:41 UTC (permalink / raw) To: openembedded-core It is useful in a few different contexts to see which files have been written out by an sstate task; break out a function that lets us get the path to the manifest file easily. Signed-off-by: Paul Eggleton <paul.eggleton@linux.intel.com> --- meta/classes/sstate.bbclass | 7 ++----- meta/lib/oe/sstatesig.py | 12 ++++++++++++ 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass index 77313bc..b75700eb 100644 --- a/meta/classes/sstate.bbclass +++ b/meta/classes/sstate.bbclass @@ -157,17 +157,14 @@ def sstate_add(ss, source, dest, d): def sstate_install(ss, d): import oe.path + import oe.sstatesig import subprocess sharedfiles = [] shareddirs = [] bb.utils.mkdirhier(d.expand("${SSTATE_MANIFESTS}")) - d2 = d.createCopy() - extrainf = d.getVarFlag("do_" + ss['task'], 'stamp-extra-info', True) - if extrainf: - d2.setVar("SSTATE_MANMACH", extrainf) - manifest = d2.expand("${SSTATE_MANFILEPREFIX}.%s" % ss['task']) + manifest, d2 = oe.sstatesig.sstate_get_manifest_filename(ss['task'], d) if os.access(manifest, os.R_OK): bb.fatal("Package already staged (%s)?!" % manifest) diff --git a/meta/lib/oe/sstatesig.py b/meta/lib/oe/sstatesig.py index 9d6d7c4..cb46712 100644 --- a/meta/lib/oe/sstatesig.py +++ b/meta/lib/oe/sstatesig.py @@ -277,3 +277,15 @@ def find_siginfo(pn, taskname, taskhashlist, d): return filedates bb.siggen.find_siginfo = find_siginfo + + +def sstate_get_manifest_filename(task, d): + """ + Return the sstate manifest file path for a particular task. + Also returns the datastore that can be used to query related variables. + """ + d2 = d.createCopy() + extrainf = d.getVarFlag("do_" + task, 'stamp-extra-info', True) + if extrainf: + d2.setVar("SSTATE_MANMACH", extrainf) + return (d2.expand("${SSTATE_MANFILEPREFIX}.%s" % task), d2) -- 2.1.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH 2/3] build-compare: add support for examining deb and ipk packages 2015-09-08 13:41 [RFC PATCH 0/3] WIP: add ability to reduce package feed churn Paul Eggleton 2015-09-08 13:41 ` [RFC PATCH 1/3] classes/sstate: break out function to get sstate manifest filename Paul Eggleton @ 2015-09-08 13:41 ` Paul Eggleton 2015-09-08 13:41 ` [RFC PATCH 3/3] WIP: classes/packagefeed-stability: add class to help reduce package feed churn Paul Eggleton 2 siblings, 0 replies; 7+ messages in thread From: Paul Eggleton @ 2015-09-08 13:41 UTC (permalink / raw) To: openembedded-core This is just rudimentary support at the moment as we'd potentially want to compare the control files a bit more specifically than this does, but it's a start. Signed-off-by: Paul Eggleton <paul.eggleton@linux.intel.com> --- .../build-compare/build-compare_git.bb | 8 ++- ...001-Add-support-for-deb-and-ipk-packaging.patch | 64 ++++++++++++++++++++++ 2 files changed, 69 insertions(+), 3 deletions(-) create mode 100644 meta/recipes-devtools/build-compare/files/0001-Add-support-for-deb-and-ipk-packaging.patch diff --git a/meta/recipes-devtools/build-compare/build-compare_git.bb b/meta/recipes-devtools/build-compare/build-compare_git.bb index 418aee0..7ac3784 100644 --- a/meta/recipes-devtools/build-compare/build-compare_git.bb +++ b/meta/recipes-devtools/build-compare/build-compare_git.bb @@ -5,9 +5,11 @@ HOMEPAGE = "https://github.com/openSUSE/build-compare" LICENSE = "GPLv2" LIC_FILES_CHKSUM = "file://COPYING;md5=751419260aa954499f7abaabaa882bbe" -SRC_URI = "git://github.com/openSUSE/build-compare.git" -SRC_URI += "file://Rename-rpm-check.sh-to-pkg-diff.sh.patch;striplevel=1" -SRC_URI += "file://Ignore-DWARF-sections.patch;striplevel=1" +SRC_URI = "git://github.com/openSUSE/build-compare.git \ + file://Rename-rpm-check.sh-to-pkg-diff.sh.patch;striplevel=1 \ + file://Ignore-DWARF-sections.patch;striplevel=1 \ + file://0001-Add-support-for-deb-and-ipk-packaging.patch \ + " PATCHTOOL = "git" SRCREV = "c5352c054c6ef15735da31b76d6d88620f4aff0a" diff --git a/meta/recipes-devtools/build-compare/files/0001-Add-support-for-deb-and-ipk-packaging.patch b/meta/recipes-devtools/build-compare/files/0001-Add-support-for-deb-and-ipk-packaging.patch new file mode 100644 index 0000000..5c15218 --- /dev/null +++ b/meta/recipes-devtools/build-compare/files/0001-Add-support-for-deb-and-ipk-packaging.patch @@ -0,0 +1,64 @@ +From 02dbc7e3478e409d6f5e3e1c53daddf8838be999 Mon Sep 17 00:00:00 2001 +From: Paul Eggleton <paul.eggleton@linux.intel.com> +Date: Tue, 1 Sep 2015 12:04:33 +0100 +Subject: [PATCH] Add support for deb and ipk packaging + +Upstream-Status: Pending + +Signed-off-by: Paul Eggleton <paul.eggleton@linux.intel.com> +--- + functions.sh | 15 +++++++++++++++ + pkg-diff.sh | 6 ++++++ + 2 files changed, 21 insertions(+) + +diff --git a/functions.sh b/functions.sh +index 06079df..85c9003 100644 +--- a/functions.sh ++++ b/functions.sh +@@ -85,6 +85,13 @@ function unpackage() + CPIO_OPTS="--extract --unconditional --preserve-modification-time --make-directories --quiet" + rpm2cpio $file | cpio ${CPIO_OPTS} + ;; ++ *.ipk|*.deb) ++ ar x $file ++ tar xf control.tar.gz ++ rm control.tar.gz ++ tar xf data.tar.gz ++ rm data.tar.gz ++ ;; + esac + popd 1>/dev/null + } +@@ -255,4 +262,12 @@ function cmp_spec () + rm $file1 $file2 + return $RES + } ++ ++function adjust_controlfile () { ++ cat $1/control | sed '/^Version: /d' > $1/control.fixed ++ mv $1/control.fixed $1/control ++ cat $2/control | sed '/^Version: /d' > $2/control.fixed ++ mv $2/control.fixed $2/control ++} ++ + # vim: tw=666 ts=2 et +diff --git a/pkg-diff.sh b/pkg-diff.sh +index 0f1fa76..3cf10aa 100644 +--- a/pkg-diff.sh ++++ b/pkg-diff.sh +@@ -138,6 +138,12 @@ echo "Extracting packages" + unpackage $oldpkg $dir/old + unpackage $newpkg $dir/new + ++case $oldpkg in ++ *.deb|*.ipk) ++ adjust_controlfile $dir/old $dir/new ++ ;; ++esac ++ + # files is set in cmp_spec for rpms, so if RES is empty we should assume + # it wasn't an rpm and pick all files for comparison. + if [ -z $RES ]; then +-- +2.1.0 + -- 2.1.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH 3/3] WIP: classes/packagefeed-stability: add class to help reduce package feed churn 2015-09-08 13:41 [RFC PATCH 0/3] WIP: add ability to reduce package feed churn Paul Eggleton 2015-09-08 13:41 ` [RFC PATCH 1/3] classes/sstate: break out function to get sstate manifest filename Paul Eggleton 2015-09-08 13:41 ` [RFC PATCH 2/3] build-compare: add support for examining deb and ipk packages Paul Eggleton @ 2015-09-08 13:41 ` Paul Eggleton 2015-09-08 14:09 ` Mark Hatle 2 siblings, 1 reply; 7+ messages in thread From: Paul Eggleton @ 2015-09-08 13:41 UTC (permalink / raw) To: openembedded-core When a dependency causes a recipe to effectively be rebuilt, its output may in fact not change; but new packages (with an increased PR value, if using the PR server) will be generated nonetheless. There's no practical way for us to predict whether or not this is going to be the case based solely on the inputs, but we can compare the package output and see if that is materially different and based upon that decide to replace the old package with the new one. This class effectively intercepts packages as they are written out by do_package_write_*, causing them to be written into a different directory where we can compare them to whatever older packages might be in the "real" package feed directory, and avoid copying the new package to the feed if it has not materially changed. We use build-compare to do the package comparison. (NOTE: this is still a work in progress and there are no doubt unresolved issues.) Signed-off-by: Paul Eggleton <paul.eggleton@linux.intel.com> --- meta/classes/packagefeed-stability.bbclass | 270 +++++++++++++++++++++++++++++ 1 file changed, 270 insertions(+) create mode 100644 meta/classes/packagefeed-stability.bbclass diff --git a/meta/classes/packagefeed-stability.bbclass b/meta/classes/packagefeed-stability.bbclass new file mode 100644 index 0000000..b5207d9 --- /dev/null +++ b/meta/classes/packagefeed-stability.bbclass @@ -0,0 +1,270 @@ +# Class to avoid copying packages into the feed if they haven't materially changed +# +# Copyright (C) 2015 Intel Corporation +# Released under the MIT license (see COPYING.MIT for details) +# +# This class effectively intercepts packages as they are written out by +# do_package_write_*, causing them to be written into a different +# directory where we can compare them to whatever older packages might +# be in the "real" package feed directory, and avoid copying the new +# package to the feed if it has not materially changed. The idea is to +# avoid unnecessary churn in the packages when dependencies trigger task +# reexecution (and thus repackaging). Enabling the class is simple: +# +# INHERIT += "packagefeed-stability" +# +# Caveats: +# 1) Latest PR values in the build system may not match those in packages +# seen on the target (naturally) +# 2) If you rebuild from sstate without the existing package feed present, +# you will lose the "state" of the package feed i.e. the preserved old +# package versions. Not the end of the world, but would negate the +# entire purpose of this class. +# +# Note that running -c cleanall on a recipe will purposely delete the old +# package files so they will definitely be copied the next time. + +python() { + # Package backend agnostic intercept + # This assumes that the package_write task is called package_write_<pkgtype> + # and that the directory in which packages should be written is + # pointed to by the variable DEPLOY_DIR_<PKGTYPE> + for pkgclass in (d.getVar('PACKAGE_CLASSES', True) or '').split(): + if pkgclass.startswith('package_'): + pkgtype = pkgclass.split('_', 1)[1] + pkgwritefunc = 'do_package_write_%s' % pkgtype + sstate_outputdirs = d.getVarFlag(pkgwritefunc, 'sstate-outputdirs', False) + deploydirvar = 'DEPLOY_DIR_%s' % pkgtype.upper() + deploydirvarref = '${' + deploydirvar + '}' + pkgcomparefunc = 'do_package_compare_%s' % pkgtype + + if bb.data.inherits_class('image', d): + d.appendVarFlag('do_rootfs', 'recrdeptask', ' ' + pkgcomparefunc) + + if bb.data.inherits_class('populate_sdk_base', d): + d.appendVarFlag('do_populate_sdk', 'recrdeptask', ' ' + pkgcomparefunc) + + if bb.data.inherits_class('populate_sdk_ext', d): + d.appendVarFlag('do_populate_sdk_ext', 'recrdeptask', ' ' + pkgcomparefunc) + + d.appendVarFlag('do_build', 'recrdeptask', ' ' + pkgcomparefunc) + + if d.getVarFlag(pkgwritefunc, 'noexec', True) or (not d.getVarFlag(pkgwritefunc, 'task', True)) or pkgwritefunc in (d.getVar('__BBDELTASKS', True) or []): + # Packaging is disabled for this recipe, we shouldn't do anything + continue + + if deploydirvarref in sstate_outputdirs: + # Set intermediate output directory + d.setVarFlag(pkgwritefunc, 'sstate-outputdirs', sstate_outputdirs.replace(deploydirvarref, deploydirvarref + '-prediff')) + + d.setVar(pkgcomparefunc, d.getVar('do_package_compare', False)) + d.setVarFlags(pkgcomparefunc, d.getVarFlags('do_package_compare', False)) + d.appendVarFlag(pkgcomparefunc, 'depends', ' build-compare-native:do_populate_sysroot') + bb.build.addtask(pkgcomparefunc, 'do_build', 'do_packagedata ' + pkgwritefunc, d) +} + +# This isn't the real task function - it's a template that we use in the +# anonymous python code above +fakeroot python do_package_compare () { + currenttask = d.getVar('BB_CURRENTTASK', True) + pkgtype = currenttask.rsplit('_', 1)[1] + package_compare_impl(pkgtype, d) +} + +def package_compare_impl(pkgtype, d): + import errno + import fnmatch + import glob + import subprocess + import oe.sstatesig + + pn = d.getVar('PN', True) + deploydir = d.getVar('DEPLOY_DIR_%s' % pkgtype.upper(), True) + prepath = deploydir + '-prediff/' + + # Find out PKGR values are + pkgdatadir = d.getVar('PKGDATA_DIR', True) + packages = [] + try: + with open(os.path.join(pkgdatadir, pn), 'r') as f: + for line in f: + if line.startswith('PACKAGES:'): + packages = line.split(':', 1)[1].split() + break + except IOError as e: + if e.errno == errno.ENOENT: + pass + + if not packages: + bb.debug(2, '%s: no packages, nothing to do' % pn) + return + + pkgrvalues = {} + rpkgnames = {} + rdepends = {} + pkgvvalues = {} + for pkg in packages: + with open(os.path.join(pkgdatadir, 'runtime', pkg), 'r') as f: + for line in f: + if line.startswith('PKGR:'): + pkgrvalues[pkg] = line.split(':', 1)[1].strip() + if line.startswith('PKGV:'): + pkgvvalues[pkg] = line.split(':', 1)[1].strip() + elif line.startswith('PKG_%s:' % pkg): + rpkgnames[pkg] = line.split(':', 1)[1].strip() + elif line.startswith('RDEPENDS_%s:' % pkg): + rdepends[pkg] = line.split(':', 1)[1].strip() + + # Prepare a list of the runtime package names for packages that were + # actually produced + rpkglist = [] + for pkg, rpkg in rpkgnames.iteritems(): + if os.path.exists(os.path.join(pkgdatadir, 'runtime', pkg + '.packaged')): + rpkglist.append((rpkg, pkg)) + rpkglist.sort(key=lambda x: len(x[0]), reverse=True) + + pvu = d.getVar('PV', False) + if '$' + '{SRCPV}' in pvu: + pvprefix = pvu.split('$' + '{SRCPV}', 1)[0] + else: + pvprefix = None + + pkgwritetask = 'package_write_%s' % pkgtype + files = [] + copypkgs = [] + manifest, _ = oe.sstatesig.sstate_get_manifest_filename(pkgwritetask, d) + with open(manifest, 'r') as f: + for line in f: + if line.startswith(prepath): + srcpath = line.rstrip() + if os.path.isfile(srcpath): + destpath = os.path.join(deploydir, os.path.relpath(srcpath, prepath)) + + # This is crude but should work assuming the output + # package file name starts with the package name + # and rpkglist is sorted by length (descending) + pkgbasename = os.path.basename(destpath) + pkgname = None + for rpkg, pkg in rpkglist: + if pkgbasename.startswith(rpkg): + pkgr = pkgrvalues[pkg] + destpathspec = destpath.replace(pkgr, '*') + if pvprefix: + pkgv = pkgvvalues[pkg] + if pkgv.startswith(pvprefix): + pkgvsuffix = pkgv[len(pvprefix):] + if '+' in pkgvsuffix: + newpkgv = pvprefix + '*+' + pkgvsuffix.split('+', 1)[1] + destpathspec = destpathspec.replace(pkgv, newpkgv) + pkgname = pkg + break + else: + bb.warn('Unable to map %s back to package' % pkgbasename) + destpathspec = destpath + + oldfiles = glob.glob(destpathspec) + oldfile = None + docopy = True + if oldfiles: + oldfile = oldfiles[-1] + result = subprocess.call(['pkg-diff.sh', oldfile, srcpath]) + if result == 0: + docopy = False + + files.append((pkgname, pkgbasename, srcpath, oldfile, destpath)) + bb.debug(2, '%s: package %s %s' % (pn, files[-1], docopy)) + if docopy: + copypkgs.append(pkgname) + + # Ensure that dependencies on specific versions (such as -dev on the + # main package) are copied in lock-step + changed = True + while changed: + rpkgdict = {x[0]: x[1] for x in rpkglist} + changed = False + for pkgname, pkgbasename, srcpath, oldfile, destpath in files: + rdeps = rdepends.get(pkgname, None) + if not rdeps: + continue + rdepvers = bb.utils.explode_dep_versions2(rdeps) + for rdep, versions in rdepvers.iteritems(): + dep = rpkgdict.get(rdep, None) + for version in versions: + if version and version.startswith('= '): + if dep in copypkgs and not pkgname in copypkgs: + bb.debug(2, '%s: copying %s because it has a fixed version dependency on %s and that package is going to be copied' % (pn, pkgname, dep)) + changed = True + copypkgs.append(pkgname) + elif pkgname in copypkgs and not dep in copypkgs: + bb.debug(2, '%s: copying %s because %s has a fixed version dependency on it and that package is going to be copied' % (pn, dep, pkgname)) + changed = True + copypkgs.append(dep) + + # Read in old manifest so we can delete any packages we aren't going to replace or preserve + pcmanifest = os.path.join(prepath, d.expand('pkg-compare-manifest-${MULTIMACH_TARGET_SYS}-${PN}')) + try: + with open(pcmanifest, 'r') as f: + knownfiles = [x[3] for x in files if x[3]] + for line in f: + fn = line.rstrip() + if fn: + if fn in knownfiles: + knownfiles.remove(fn) + else: + try: + os.remove(fn) + bb.warn('Removed old package %s' % fn) + except OSError as e: + if e.errno == errno.ENOENT: + pass + except IOError as e: + if e.errno == errno.ENOENT: + pass + + # Create new manifest + with open(pcmanifest, 'w') as f: + for pkgname, pkgbasename, srcpath, oldfile, destpath in files: + if pkgname in copypkgs: + bb.warn('Copying %s' % pkgbasename) + destdir = os.path.dirname(destpath) + bb.utils.mkdirhier(destdir) + if oldfile: + try: + os.remove(oldfile) + except OSError as e: + if e.errno == errno.ENOENT: + pass + if (os.stat(srcpath).st_dev == os.stat(destdir).st_dev): + # Use a hard link to save space + os.link(srcpath, destpath) + else: + shutil.copyfile(srcpath, destpath) + f.write('%s\n' % destpath) + else: + bb.warn('Not copying %s' % pkgbasename) + f.write('%s\n' % oldfile) + + +do_cleanall_append() { + import errno + for pkgclass in (d.getVar('PACKAGE_CLASSES', True) or '').split(): + if pkgclass.startswith('package_'): + pkgtype = pkgclass.split('_', 1)[1] + deploydir = d.getVar('DEPLOY_DIR_%s' % pkgtype.upper(), True) + prepath = deploydir + '-prediff' + pcmanifest = os.path.join(prepath, d.expand('pkg-compare-manifest-${MULTIMACH_TARGET_SYS}-${PN}')) + try: + with open(pcmanifest, 'r') as f: + for line in f: + fn = line.rstrip() + if fn: + try: + os.remove(fn) + except OSError as e: + if e.errno == errno.ENOENT: + pass + os.remove(pcmanifest) + except IOError as e: + if e.errno == errno.ENOENT: + pass +} -- 2.1.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC PATCH 3/3] WIP: classes/packagefeed-stability: add class to help reduce package feed churn 2015-09-08 13:41 ` [RFC PATCH 3/3] WIP: classes/packagefeed-stability: add class to help reduce package feed churn Paul Eggleton @ 2015-09-08 14:09 ` Mark Hatle 2015-09-08 15:03 ` Paul Eggleton 0 siblings, 1 reply; 7+ messages in thread From: Mark Hatle @ 2015-09-08 14:09 UTC (permalink / raw) To: Paul Eggleton, openembedded-core I've got some scripts where I do this externally of the build system. So a few comments on the stuff below based on what I found working on those scripts. (Basically the system deploys as normal.. then we run the scripts to copy the deploy directory contents to a release directory... we do the build-compare as part of that copy...) See below inline... On 9/8/15 8:41 AM, Paul Eggleton wrote: > When a dependency causes a recipe to effectively be rebuilt, its output > may in fact not change; but new packages (with an increased PR value, if > using the PR server) will be generated nonetheless. There's no practical > way for us to predict whether or not this is going to be the case based > solely on the inputs, but we can compare the package output and see if > that is materially different and based upon that decide to replace the > old package with the new one. > > This class effectively intercepts packages as they are written out by > do_package_write_*, causing them to be written into a different > directory where we can compare them to whatever older packages might > be in the "real" package feed directory, and avoid copying the new > package to the feed if it has not materially changed. We use > build-compare to do the package comparison. > > (NOTE: this is still a work in progress and there are no doubt > unresolved issues.) > > Signed-off-by: Paul Eggleton <paul.eggleton@linux.intel.com> > --- > meta/classes/packagefeed-stability.bbclass | 270 +++++++++++++++++++++++++++++ > 1 file changed, 270 insertions(+) > create mode 100644 meta/classes/packagefeed-stability.bbclass > > diff --git a/meta/classes/packagefeed-stability.bbclass b/meta/classes/packagefeed-stability.bbclass > new file mode 100644 > index 0000000..b5207d9 > --- /dev/null > +++ b/meta/classes/packagefeed-stability.bbclass > @@ -0,0 +1,270 @@ > +# Class to avoid copying packages into the feed if they haven't materially changed > +# > +# Copyright (C) 2015 Intel Corporation > +# Released under the MIT license (see COPYING.MIT for details) > +# > +# This class effectively intercepts packages as they are written out by > +# do_package_write_*, causing them to be written into a different > +# directory where we can compare them to whatever older packages might > +# be in the "real" package feed directory, and avoid copying the new > +# package to the feed if it has not materially changed. The idea is to > +# avoid unnecessary churn in the packages when dependencies trigger task > +# reexecution (and thus repackaging). Enabling the class is simple: > +# > +# INHERIT += "packagefeed-stability" > +# > +# Caveats: > +# 1) Latest PR values in the build system may not match those in packages > +# seen on the target (naturally) > +# 2) If you rebuild from sstate without the existing package feed present, > +# you will lose the "state" of the package feed i.e. the preserved old > +# package versions. Not the end of the world, but would negate the > +# entire purpose of this class. > +# > +# Note that running -c cleanall on a recipe will purposely delete the old > +# package files so they will definitely be copied the next time. > + > +python() { > + # Package backend agnostic intercept > + # This assumes that the package_write task is called package_write_<pkgtype> > + # and that the directory in which packages should be written is > + # pointed to by the variable DEPLOY_DIR_<PKGTYPE> > + for pkgclass in (d.getVar('PACKAGE_CLASSES', True) or '').split(): > + if pkgclass.startswith('package_'): > + pkgtype = pkgclass.split('_', 1)[1] > + pkgwritefunc = 'do_package_write_%s' % pkgtype > + sstate_outputdirs = d.getVarFlag(pkgwritefunc, 'sstate-outputdirs', False) > + deploydirvar = 'DEPLOY_DIR_%s' % pkgtype.upper() > + deploydirvarref = '${' + deploydirvar + '}' > + pkgcomparefunc = 'do_package_compare_%s' % pkgtype > + > + if bb.data.inherits_class('image', d): > + d.appendVarFlag('do_rootfs', 'recrdeptask', ' ' + pkgcomparefunc) > + > + if bb.data.inherits_class('populate_sdk_base', d): > + d.appendVarFlag('do_populate_sdk', 'recrdeptask', ' ' + pkgcomparefunc) > + > + if bb.data.inherits_class('populate_sdk_ext', d): > + d.appendVarFlag('do_populate_sdk_ext', 'recrdeptask', ' ' + pkgcomparefunc) > + > + d.appendVarFlag('do_build', 'recrdeptask', ' ' + pkgcomparefunc) > + > + if d.getVarFlag(pkgwritefunc, 'noexec', True) or (not d.getVarFlag(pkgwritefunc, 'task', True)) or pkgwritefunc in (d.getVar('__BBDELTASKS', True) or []): > + # Packaging is disabled for this recipe, we shouldn't do anything > + continue > + > + if deploydirvarref in sstate_outputdirs: > + # Set intermediate output directory > + d.setVarFlag(pkgwritefunc, 'sstate-outputdirs', sstate_outputdirs.replace(deploydirvarref, deploydirvarref + '-prediff')) > + > + d.setVar(pkgcomparefunc, d.getVar('do_package_compare', False)) > + d.setVarFlags(pkgcomparefunc, d.getVarFlags('do_package_compare', False)) > + d.appendVarFlag(pkgcomparefunc, 'depends', ' build-compare-native:do_populate_sysroot') > + bb.build.addtask(pkgcomparefunc, 'do_build', 'do_packagedata ' + pkgwritefunc, d) > +} > + > +# This isn't the real task function - it's a template that we use in the > +# anonymous python code above > +fakeroot python do_package_compare () { > + currenttask = d.getVar('BB_CURRENTTASK', True) > + pkgtype = currenttask.rsplit('_', 1)[1] > + package_compare_impl(pkgtype, d) > +} > + > +def package_compare_impl(pkgtype, d): > + import errno > + import fnmatch > + import glob > + import subprocess > + import oe.sstatesig > + > + pn = d.getVar('PN', True) > + deploydir = d.getVar('DEPLOY_DIR_%s' % pkgtype.upper(), True) > + prepath = deploydir + '-prediff/' > + > + # Find out PKGR values are > + pkgdatadir = d.getVar('PKGDATA_DIR', True) > + packages = [] > + try: > + with open(os.path.join(pkgdatadir, pn), 'r') as f: > + for line in f: > + if line.startswith('PACKAGES:'): > + packages = line.split(':', 1)[1].split() > + break > + except IOError as e: > + if e.errno == errno.ENOENT: > + pass > + > + if not packages: > + bb.debug(2, '%s: no packages, nothing to do' % pn) > + return > + > + pkgrvalues = {} > + rpkgnames = {} > + rdepends = {} > + pkgvvalues = {} > + for pkg in packages: > + with open(os.path.join(pkgdatadir, 'runtime', pkg), 'r') as f: > + for line in f: > + if line.startswith('PKGR:'): > + pkgrvalues[pkg] = line.split(':', 1)[1].strip() > + if line.startswith('PKGV:'): > + pkgvvalues[pkg] = line.split(':', 1)[1].strip() > + elif line.startswith('PKG_%s:' % pkg): > + rpkgnames[pkg] = line.split(':', 1)[1].strip() > + elif line.startswith('RDEPENDS_%s:' % pkg): > + rdepends[pkg] = line.split(':', 1)[1].strip() > + > + # Prepare a list of the runtime package names for packages that were > + # actually produced > + rpkglist = [] > + for pkg, rpkg in rpkgnames.iteritems(): > + if os.path.exists(os.path.join(pkgdatadir, 'runtime', pkg + '.packaged')): > + rpkglist.append((rpkg, pkg)) > + rpkglist.sort(key=lambda x: len(x[0]), reverse=True) > + > + pvu = d.getVar('PV', False) > + if '$' + '{SRCPV}' in pvu: > + pvprefix = pvu.split('$' + '{SRCPV}', 1)[0] > + else: > + pvprefix = None > + > + pkgwritetask = 'package_write_%s' % pkgtype > + files = [] > + copypkgs = [] > + manifest, _ = oe.sstatesig.sstate_get_manifest_filename(pkgwritetask, d) > + with open(manifest, 'r') as f: > + for line in f: > + if line.startswith(prepath): Is this comparing the -packages- or the -files-? I ask, because if you process the packages you can short-cut some of the comparisons.. it makes a huge improvement in speed. Specifically, skip processing all '-dev' and '-dbg' packages. (Not -staticdev). The -dev and -dbg components need to match the runtime items. If the runtime items have no changed then there is no reason that the -dev and -dbg items would have changed -- other then timestamps and file paths.. which the build-compare filters out anyway. > + srcpath = line.rstrip() > + if os.path.isfile(srcpath): > + destpath = os.path.join(deploydir, os.path.relpath(srcpath, prepath)) > + > + # This is crude but should work assuming the output > + # package file name starts with the package name > + # and rpkglist is sorted by length (descending) > + pkgbasename = os.path.basename(destpath) > + pkgname = None > + for rpkg, pkg in rpkglist: > + if pkgbasename.startswith(rpkg): > + pkgr = pkgrvalues[pkg] > + destpathspec = destpath.replace(pkgr, '*') > + if pvprefix: > + pkgv = pkgvvalues[pkg] > + if pkgv.startswith(pvprefix): > + pkgvsuffix = pkgv[len(pvprefix):] > + if '+' in pkgvsuffix: > + newpkgv = pvprefix + '*+' + pkgvsuffix.split('+', 1)[1] > + destpathspec = destpathspec.replace(pkgv, newpkgv) > + pkgname = pkg > + break > + else: > + bb.warn('Unable to map %s back to package' % pkgbasename) > + destpathspec = destpath > + > + oldfiles = glob.glob(destpathspec) > + oldfile = None > + docopy = True > + if oldfiles: > + oldfile = oldfiles[-1] > + result = subprocess.call(['pkg-diff.sh', oldfile, srcpath]) > + if result == 0: > + docopy = False > + If any of the packages differ, then -all- of the packages differ. You can drop out of the loop at that point and simply copy the full set of packages over at this point. The reason for this is that the packages from a single recipe may (and often do) have PN-PV-PR dependencies in them -- so they have to be kept as a unit. Also the -dbg package contains the overall debug information for the full set of packages produced by a recipe.. so if you change a package the -dbg has to be updated -- since the -dbg was updated, everything has to be updated. So drop the loop, abort the compares and copy the full manifest set over in one shot.. For large packages like perl and python it makes a HUGH difference to avoid all of the extra comparisons. (I also did this because there is some concern in my mind if releasing only part of a recipe build ever makes sense.. because I've not gotten into the situation where the only way to reproduce the released feed would be to build the old version first, then come in and build the new version next.. this isn't a typical workflow and greatly complicates things.) > + files.append((pkgname, pkgbasename, srcpath, oldfile, destpath)) > + bb.debug(2, '%s: package %s %s' % (pn, files[-1], docopy)) > + if docopy: > + copypkgs.append(pkgname) > + Making the change above would eliminate the need for the code below.. and reduce the possibility of things getting out of sync. > + # Ensure that dependencies on specific versions (such as -dev on the > + # main package) are copied in lock-step > + changed = True > + while changed: > + rpkgdict = {x[0]: x[1] for x in rpkglist} > + changed = False > + for pkgname, pkgbasename, srcpath, oldfile, destpath in files: > + rdeps = rdepends.get(pkgname, None) > + if not rdeps: > + continue > + rdepvers = bb.utils.explode_dep_versions2(rdeps) > + for rdep, versions in rdepvers.iteritems(): > + dep = rpkgdict.get(rdep, None) > + for version in versions: > + if version and version.startswith('= '): > + if dep in copypkgs and not pkgname in copypkgs: > + bb.debug(2, '%s: copying %s because it has a fixed version dependency on %s and that package is going to be copied' % (pn, pkgname, dep)) > + changed = True > + copypkgs.append(pkgname) > + elif pkgname in copypkgs and not dep in copypkgs: > + bb.debug(2, '%s: copying %s because %s has a fixed version dependency on it and that package is going to be copied' % (pn, dep, pkgname)) > + changed = True > + copypkgs.append(dep) > + > + # Read in old manifest so we can delete any packages we aren't going to replace or preserve > + pcmanifest = os.path.join(prepath, d.expand('pkg-compare-manifest-${MULTIMACH_TARGET_SYS}-${PN}')) > + try: > + with open(pcmanifest, 'r') as f: > + knownfiles = [x[3] for x in files if x[3]] > + for line in f: > + fn = line.rstrip() > + if fn: > + if fn in knownfiles: > + knownfiles.remove(fn) > + else: > + try: > + os.remove(fn) > + bb.warn('Removed old package %s' % fn) > + except OSError as e: > + if e.errno == errno.ENOENT: > + pass > + except IOError as e: > + if e.errno == errno.ENOENT: > + pass > + > + # Create new manifest > + with open(pcmanifest, 'w') as f: > + for pkgname, pkgbasename, srcpath, oldfile, destpath in files: > + if pkgname in copypkgs: > + bb.warn('Copying %s' % pkgbasename) > + destdir = os.path.dirname(destpath) > + bb.utils.mkdirhier(destdir) > + if oldfile: > + try: > + os.remove(oldfile) > + except OSError as e: > + if e.errno == errno.ENOENT: > + pass > + if (os.stat(srcpath).st_dev == os.stat(destdir).st_dev): > + # Use a hard link to save space > + os.link(srcpath, destpath) > + else: > + shutil.copyfile(srcpath, destpath) > + f.write('%s\n' % destpath) > + else: > + bb.warn('Not copying %s' % pkgbasename) > + f.write('%s\n' % oldfile) > + > + > +do_cleanall_append() { > + import errno > + for pkgclass in (d.getVar('PACKAGE_CLASSES', True) or '').split(): > + if pkgclass.startswith('package_'): > + pkgtype = pkgclass.split('_', 1)[1] > + deploydir = d.getVar('DEPLOY_DIR_%s' % pkgtype.upper(), True) > + prepath = deploydir + '-prediff' > + pcmanifest = os.path.join(prepath, d.expand('pkg-compare-manifest-${MULTIMACH_TARGET_SYS}-${PN}')) > + try: > + with open(pcmanifest, 'r') as f: > + for line in f: > + fn = line.rstrip() > + if fn: > + try: > + os.remove(fn) > + except OSError as e: > + if e.errno == errno.ENOENT: > + pass > + os.remove(pcmanifest) > + except IOError as e: > + if e.errno == errno.ENOENT: > + pass > +} > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH 3/3] WIP: classes/packagefeed-stability: add class to help reduce package feed churn 2015-09-08 14:09 ` Mark Hatle @ 2015-09-08 15:03 ` Paul Eggleton 2015-09-08 15:47 ` Mark Hatle 0 siblings, 1 reply; 7+ messages in thread From: Paul Eggleton @ 2015-09-08 15:03 UTC (permalink / raw) To: Mark Hatle; +Cc: openembedded-core Hi Mark, On Tuesday 08 September 2015 09:09:34 Mark Hatle wrote: > I've got some scripts where I do this externally of the build system. So a > few comments on the stuff below based on what I found working on those > scripts. > > (Basically the system deploys as normal.. then we run the scripts to copy > the deploy directory contents to a release directory... we do the > build-compare as part of that copy...) Ah ok, interesting. I meant to mention as part of the cover letter actually that Randy and I had considered the approach of doing it separately and Randy actually got some way down the path of implementing it, until I realised that it would be problematic in that the images you'd generate out of the same process as the package feed would have the newer packages in them instead of the ones that were in the feed. > See below inline... > > On 9/8/15 8:41 AM, Paul Eggleton wrote: > > When a dependency causes a recipe to effectively be rebuilt, its output > > may in fact not change; but new packages (with an increased PR value, if > > using the PR server) will be generated nonetheless. There's no practical > > way for us to predict whether or not this is going to be the case based > > solely on the inputs, but we can compare the package output and see if > > that is materially different and based upon that decide to replace the > > old package with the new one. > > > > This class effectively intercepts packages as they are written out by > > do_package_write_*, causing them to be written into a different > > directory where we can compare them to whatever older packages might > > be in the "real" package feed directory, and avoid copying the new > > package to the feed if it has not materially changed. We use > > build-compare to do the package comparison. > > > > (NOTE: this is still a work in progress and there are no doubt > > unresolved issues.) > > > > Signed-off-by: Paul Eggleton <paul.eggleton@linux.intel.com> > > --- > > > > meta/classes/packagefeed-stability.bbclass | 270 > > +++++++++++++++++++++++++++++ 1 file changed, 270 insertions(+) > > create mode 100644 meta/classes/packagefeed-stability.bbclass > > > > diff --git a/meta/classes/packagefeed-stability.bbclass > > b/meta/classes/packagefeed-stability.bbclass new file mode 100644 > > index 0000000..b5207d9 > > --- /dev/null > > +++ b/meta/classes/packagefeed-stability.bbclass > > @@ -0,0 +1,270 @@ > > +# Class to avoid copying packages into the feed if they haven't > > materially changed +# > > +# Copyright (C) 2015 Intel Corporation > > +# Released under the MIT license (see COPYING.MIT for details) > > +# > > +# This class effectively intercepts packages as they are written out by > > +# do_package_write_*, causing them to be written into a different > > +# directory where we can compare them to whatever older packages might > > +# be in the "real" package feed directory, and avoid copying the new > > +# package to the feed if it has not materially changed. The idea is to > > +# avoid unnecessary churn in the packages when dependencies trigger task > > +# reexecution (and thus repackaging). Enabling the class is simple: > > +# > > +# INHERIT += "packagefeed-stability" > > +# > > +# Caveats: > > +# 1) Latest PR values in the build system may not match those in packages > > +# seen on the target (naturally) > > +# 2) If you rebuild from sstate without the existing package feed > > present, > > +# you will lose the "state" of the package feed i.e. the preserved old > > +# package versions. Not the end of the world, but would negate the > > +# entire purpose of this class. > > +# > > +# Note that running -c cleanall on a recipe will purposely delete the old > > +# package files so they will definitely be copied the next time. > > + > > +python() { > > + # Package backend agnostic intercept > > + # This assumes that the package_write task is called > > package_write_<pkgtype> + # and that the directory in which packages > > should be written is + # pointed to by the variable > > DEPLOY_DIR_<PKGTYPE> > > + for pkgclass in (d.getVar('PACKAGE_CLASSES', True) or '').split(): > > + if pkgclass.startswith('package_'): > > + pkgtype = pkgclass.split('_', 1)[1] > > + pkgwritefunc = 'do_package_write_%s' % pkgtype > > + sstate_outputdirs = d.getVarFlag(pkgwritefunc, > > 'sstate-outputdirs', False) + deploydirvar = 'DEPLOY_DIR_%s' % > > pkgtype.upper() > > + deploydirvarref = '${' + deploydirvar + '}' > > + pkgcomparefunc = 'do_package_compare_%s' % pkgtype > > + > > + if bb.data.inherits_class('image', d): > > + d.appendVarFlag('do_rootfs', 'recrdeptask', ' ' + > > pkgcomparefunc) + > > + if bb.data.inherits_class('populate_sdk_base', d): > > + d.appendVarFlag('do_populate_sdk', 'recrdeptask', ' ' + > > pkgcomparefunc) + > > + if bb.data.inherits_class('populate_sdk_ext', d): > > + d.appendVarFlag('do_populate_sdk_ext', 'recrdeptask', ' ' > > + pkgcomparefunc) + > > + d.appendVarFlag('do_build', 'recrdeptask', ' ' + > > pkgcomparefunc) + > > + if d.getVarFlag(pkgwritefunc, 'noexec', True) or (not > > d.getVarFlag(pkgwritefunc, 'task', True)) or pkgwritefunc in > > (d.getVar('__BBDELTASKS', True) or []): + # Packaging is > > disabled for this recipe, we shouldn't do anything + > > continue > > + > > + if deploydirvarref in sstate_outputdirs: > > + # Set intermediate output directory > > + d.setVarFlag(pkgwritefunc, 'sstate-outputdirs', > > sstate_outputdirs.replace(deploydirvarref, deploydirvarref + '-prediff')) > > + > > + d.setVar(pkgcomparefunc, d.getVar('do_package_compare', > > False)) + d.setVarFlags(pkgcomparefunc, > > d.getVarFlags('do_package_compare', False)) + > > d.appendVarFlag(pkgcomparefunc, 'depends', ' > > build-compare-native:do_populate_sysroot') + > > bb.build.addtask(pkgcomparefunc, 'do_build', 'do_packagedata ' + > > pkgwritefunc, d) +} > > + > > +# This isn't the real task function - it's a template that we use in the > > +# anonymous python code above > > +fakeroot python do_package_compare () { > > + currenttask = d.getVar('BB_CURRENTTASK', True) > > + pkgtype = currenttask.rsplit('_', 1)[1] > > + package_compare_impl(pkgtype, d) > > +} > > + > > +def package_compare_impl(pkgtype, d): > > + import errno > > + import fnmatch > > + import glob > > + import subprocess > > + import oe.sstatesig > > + > > + pn = d.getVar('PN', True) > > + deploydir = d.getVar('DEPLOY_DIR_%s' % pkgtype.upper(), True) > > + prepath = deploydir + '-prediff/' > > + > > + # Find out PKGR values are > > + pkgdatadir = d.getVar('PKGDATA_DIR', True) > > + packages = [] > > + try: > > + with open(os.path.join(pkgdatadir, pn), 'r') as f: > > + for line in f: > > + if line.startswith('PACKAGES:'): > > + packages = line.split(':', 1)[1].split() > > + break > > + except IOError as e: > > + if e.errno == errno.ENOENT: > > + pass > > + > > + if not packages: > > + bb.debug(2, '%s: no packages, nothing to do' % pn) > > + return > > + > > + pkgrvalues = {} > > + rpkgnames = {} > > + rdepends = {} > > + pkgvvalues = {} > > + for pkg in packages: > > + with open(os.path.join(pkgdatadir, 'runtime', pkg), 'r') as f: > > + for line in f: > > + if line.startswith('PKGR:'): > > + pkgrvalues[pkg] = line.split(':', 1)[1].strip() > > + if line.startswith('PKGV:'): > > + pkgvvalues[pkg] = line.split(':', 1)[1].strip() > > + elif line.startswith('PKG_%s:' % pkg): > > + rpkgnames[pkg] = line.split(':', 1)[1].strip() > > + elif line.startswith('RDEPENDS_%s:' % pkg): > > + rdepends[pkg] = line.split(':', 1)[1].strip() > > + > > + # Prepare a list of the runtime package names for packages that were > > + # actually produced > > + rpkglist = [] > > + for pkg, rpkg in rpkgnames.iteritems(): > > + if os.path.exists(os.path.join(pkgdatadir, 'runtime', pkg + > > '.packaged')): + rpkglist.append((rpkg, pkg)) > > + rpkglist.sort(key=lambda x: len(x[0]), reverse=True) > > + > > + pvu = d.getVar('PV', False) > > + if '$' + '{SRCPV}' in pvu: > > + pvprefix = pvu.split('$' + '{SRCPV}', 1)[0] > > + else: > > + pvprefix = None > > + > > + pkgwritetask = 'package_write_%s' % pkgtype > > + files = [] > > + copypkgs = [] > > + manifest, _ = oe.sstatesig.sstate_get_manifest_filename(pkgwritetask, > > d) + with open(manifest, 'r') as f: > > + for line in f: > > > + if line.startswith(prepath): > > Is this comparing the -packages- or the -files-? I ask, because if you > process the packages you can short-cut some of the comparisons.. it makes a > huge improvement in speed. Well the manifest for do_package_write is each package file that it writes out, so it is packages and not their contents (at this point in the code). > Specifically, skip processing all '-dev' and '-dbg' packages. (Not > -staticdev). The -dev and -dbg components need to match the runtime items. > If the runtime items have no changed then there is no reason that the -dev > and -dbg items would have changed -- other then timestamps and file paths.. > which the build-compare filters out anyway. For -dbg I guess I was considering any case where debugging symbol generation had changed in some way; I'm not familiar enough with the format of those symbols to know whether that could really happen in practice or not. For -dev, what if someone patches a header alone (would be unusual, but not impossible to imagine)? > > + srcpath = line.rstrip() > > + if os.path.isfile(srcpath): > > + destpath = os.path.join(deploydir, > > os.path.relpath(srcpath, prepath)) + > > + # This is crude but should work assuming the output > > + # package file name starts with the package name > > + # and rpkglist is sorted by length (descending) > > + pkgbasename = os.path.basename(destpath) > > + pkgname = None > > + for rpkg, pkg in rpkglist: > > + if pkgbasename.startswith(rpkg): > > + pkgr = pkgrvalues[pkg] > > + destpathspec = destpath.replace(pkgr, '*') > > + if pvprefix: > > + pkgv = pkgvvalues[pkg] > > + if pkgv.startswith(pvprefix): > > + pkgvsuffix = pkgv[len(pvprefix):] > > + if '+' in pkgvsuffix: > > + newpkgv = pvprefix + '*+' + > > pkgvsuffix.split('+', 1)[1] + > > destpathspec = destpathspec.replace(pkgv, newpkgv) + > > pkgname = pkg > > + break > > + else: > > + bb.warn('Unable to map %s back to package' % > > pkgbasename) + destpathspec = destpath > > + > > + oldfiles = glob.glob(destpathspec) > > + oldfile = None > > + docopy = True > > + if oldfiles: > > + oldfile = oldfiles[-1] > > + result = subprocess.call(['pkg-diff.sh', oldfile, > > srcpath]) + if result == 0: > > + docopy = False > > + > > If any of the packages differ, then -all- of the packages differ. You can > drop out of the loop at that point and simply copy the full set of packages > over at this point. Right, that would be a lot simpler. I had imagined there might be an advantage in terms of reduced on-target upgrading if you'd only for example patched the source for one or two kernel modules and then rebuilt the kernel, but that might not justify the extra complexity. > The reason for this is that the packages from a single recipe may (and often > do) have PN-PV-PR dependencies in them -- so they have to be kept as a > unit. Also the -dbg package contains the overall debug information for the > full set of packages produced by a recipe.. so if you change a package the > -dbg has to be updated -- since the -dbg was updated, everything has to be > updated. That's why I had the versioned dependency check, though it would be much simpler (and faster) not to do any of that and just keep it all in lock-step. > So drop the loop, abort the compares and copy the full manifest set over in > one shot.. For large packages like perl and python it makes a HUGH > difference to avoid all of the extra comparisons. > > (I also did this because there is some concern in my mind if releasing only > part of a recipe build ever makes sense.. because I've not gotten into the > situation where the only way to reproduce the released feed would be to > build the old version first, then come in and build the new version next.. > this isn't a typical workflow and greatly complicates things.) Right, there is a potential risk (and potential for confusion) there. > > + files.append((pkgname, pkgbasename, srcpath, oldfile, > > destpath)) + bb.debug(2, '%s: package %s %s' % (pn, > > files[-1], docopy)) + if docopy: > > + copypkgs.append(pkgname) > > + > > Making the change above would eliminate the need for the code below.. and > reduce the possibility of things getting out of sync. Indeed. Cheers, Paul -- Paul Eggleton Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH 3/3] WIP: classes/packagefeed-stability: add class to help reduce package feed churn 2015-09-08 15:03 ` Paul Eggleton @ 2015-09-08 15:47 ` Mark Hatle 0 siblings, 0 replies; 7+ messages in thread From: Mark Hatle @ 2015-09-08 15:47 UTC (permalink / raw) To: Paul Eggleton; +Cc: openembedded-core On 9/8/15 10:03 AM, Paul Eggleton wrote: > Hi Mark, > > On Tuesday 08 September 2015 09:09:34 Mark Hatle wrote: >> I've got some scripts where I do this externally of the build system. So a >> few comments on the stuff below based on what I found working on those >> scripts. >> >> (Basically the system deploys as normal.. then we run the scripts to copy >> the deploy directory contents to a release directory... we do the >> build-compare as part of that copy...) > > Ah ok, interesting. I meant to mention as part of the cover letter actually > that Randy and I had considered the approach of doing it separately and Randy > actually got some way down the path of implementing it, until I realised that > it would be problematic in that the images you'd generate out of the same > process as the package feed would have the newer packages in them instead of > the ones that were in the feed. Yes, certainly an issue with image generation needing to match. >> See below inline... >> >> On 9/8/15 8:41 AM, Paul Eggleton wrote: >>> When a dependency causes a recipe to effectively be rebuilt, its output >>> may in fact not change; but new packages (with an increased PR value, if >>> using the PR server) will be generated nonetheless. There's no practical >>> way for us to predict whether or not this is going to be the case based >>> solely on the inputs, but we can compare the package output and see if >>> that is materially different and based upon that decide to replace the >>> old package with the new one. >>> >>> This class effectively intercepts packages as they are written out by >>> do_package_write_*, causing them to be written into a different >>> directory where we can compare them to whatever older packages might >>> be in the "real" package feed directory, and avoid copying the new >>> package to the feed if it has not materially changed. We use >>> build-compare to do the package comparison. >>> >>> (NOTE: this is still a work in progress and there are no doubt >>> unresolved issues.) >>> >>> Signed-off-by: Paul Eggleton <paul.eggleton@linux.intel.com> >>> --- >>> >>> meta/classes/packagefeed-stability.bbclass | 270 >>> +++++++++++++++++++++++++++++ 1 file changed, 270 insertions(+) >>> create mode 100644 meta/classes/packagefeed-stability.bbclass >>> >>> diff --git a/meta/classes/packagefeed-stability.bbclass >>> b/meta/classes/packagefeed-stability.bbclass new file mode 100644 >>> index 0000000..b5207d9 >>> --- /dev/null >>> +++ b/meta/classes/packagefeed-stability.bbclass >>> @@ -0,0 +1,270 @@ >>> +# Class to avoid copying packages into the feed if they haven't >>> materially changed +# >>> +# Copyright (C) 2015 Intel Corporation >>> +# Released under the MIT license (see COPYING.MIT for details) >>> +# >>> +# This class effectively intercepts packages as they are written out by >>> +# do_package_write_*, causing them to be written into a different >>> +# directory where we can compare them to whatever older packages might >>> +# be in the "real" package feed directory, and avoid copying the new >>> +# package to the feed if it has not materially changed. The idea is to >>> +# avoid unnecessary churn in the packages when dependencies trigger task >>> +# reexecution (and thus repackaging). Enabling the class is simple: >>> +# >>> +# INHERIT += "packagefeed-stability" >>> +# >>> +# Caveats: >>> +# 1) Latest PR values in the build system may not match those in packages >>> +# seen on the target (naturally) >>> +# 2) If you rebuild from sstate without the existing package feed >>> present, >>> +# you will lose the "state" of the package feed i.e. the preserved old >>> +# package versions. Not the end of the world, but would negate the >>> +# entire purpose of this class. >>> +# >>> +# Note that running -c cleanall on a recipe will purposely delete the old >>> +# package files so they will definitely be copied the next time. >>> + >>> +python() { >>> + # Package backend agnostic intercept >>> + # This assumes that the package_write task is called >>> package_write_<pkgtype> + # and that the directory in which packages >>> should be written is + # pointed to by the variable >>> DEPLOY_DIR_<PKGTYPE> >>> + for pkgclass in (d.getVar('PACKAGE_CLASSES', True) or '').split(): >>> + if pkgclass.startswith('package_'): >>> + pkgtype = pkgclass.split('_', 1)[1] >>> + pkgwritefunc = 'do_package_write_%s' % pkgtype >>> + sstate_outputdirs = d.getVarFlag(pkgwritefunc, >>> 'sstate-outputdirs', False) + deploydirvar = 'DEPLOY_DIR_%s' % >>> pkgtype.upper() >>> + deploydirvarref = '${' + deploydirvar + '}' >>> + pkgcomparefunc = 'do_package_compare_%s' % pkgtype >>> + >>> + if bb.data.inherits_class('image', d): >>> + d.appendVarFlag('do_rootfs', 'recrdeptask', ' ' + >>> pkgcomparefunc) + >>> + if bb.data.inherits_class('populate_sdk_base', d): >>> + d.appendVarFlag('do_populate_sdk', 'recrdeptask', ' ' + >>> pkgcomparefunc) + >>> + if bb.data.inherits_class('populate_sdk_ext', d): >>> + d.appendVarFlag('do_populate_sdk_ext', 'recrdeptask', ' ' >>> + pkgcomparefunc) + >>> + d.appendVarFlag('do_build', 'recrdeptask', ' ' + >>> pkgcomparefunc) + >>> + if d.getVarFlag(pkgwritefunc, 'noexec', True) or (not >>> d.getVarFlag(pkgwritefunc, 'task', True)) or pkgwritefunc in >>> (d.getVar('__BBDELTASKS', True) or []): + # Packaging is >>> disabled for this recipe, we shouldn't do anything + >>> continue >>> + >>> + if deploydirvarref in sstate_outputdirs: >>> + # Set intermediate output directory >>> + d.setVarFlag(pkgwritefunc, 'sstate-outputdirs', >>> sstate_outputdirs.replace(deploydirvarref, deploydirvarref + '-prediff')) >>> + >>> + d.setVar(pkgcomparefunc, d.getVar('do_package_compare', >>> False)) + d.setVarFlags(pkgcomparefunc, >>> d.getVarFlags('do_package_compare', False)) + >>> d.appendVarFlag(pkgcomparefunc, 'depends', ' >>> build-compare-native:do_populate_sysroot') + >>> bb.build.addtask(pkgcomparefunc, 'do_build', 'do_packagedata ' + >>> pkgwritefunc, d) +} >>> + >>> +# This isn't the real task function - it's a template that we use in the >>> +# anonymous python code above >>> +fakeroot python do_package_compare () { >>> + currenttask = d.getVar('BB_CURRENTTASK', True) >>> + pkgtype = currenttask.rsplit('_', 1)[1] >>> + package_compare_impl(pkgtype, d) >>> +} >>> + >>> +def package_compare_impl(pkgtype, d): >>> + import errno >>> + import fnmatch >>> + import glob >>> + import subprocess >>> + import oe.sstatesig >>> + >>> + pn = d.getVar('PN', True) >>> + deploydir = d.getVar('DEPLOY_DIR_%s' % pkgtype.upper(), True) >>> + prepath = deploydir + '-prediff/' >>> + >>> + # Find out PKGR values are >>> + pkgdatadir = d.getVar('PKGDATA_DIR', True) >>> + packages = [] >>> + try: >>> + with open(os.path.join(pkgdatadir, pn), 'r') as f: >>> + for line in f: >>> + if line.startswith('PACKAGES:'): >>> + packages = line.split(':', 1)[1].split() >>> + break >>> + except IOError as e: >>> + if e.errno == errno.ENOENT: >>> + pass >>> + >>> + if not packages: >>> + bb.debug(2, '%s: no packages, nothing to do' % pn) >>> + return >>> + >>> + pkgrvalues = {} >>> + rpkgnames = {} >>> + rdepends = {} >>> + pkgvvalues = {} >>> + for pkg in packages: >>> + with open(os.path.join(pkgdatadir, 'runtime', pkg), 'r') as f: >>> + for line in f: >>> + if line.startswith('PKGR:'): >>> + pkgrvalues[pkg] = line.split(':', 1)[1].strip() >>> + if line.startswith('PKGV:'): >>> + pkgvvalues[pkg] = line.split(':', 1)[1].strip() >>> + elif line.startswith('PKG_%s:' % pkg): >>> + rpkgnames[pkg] = line.split(':', 1)[1].strip() >>> + elif line.startswith('RDEPENDS_%s:' % pkg): >>> + rdepends[pkg] = line.split(':', 1)[1].strip() >>> + >>> + # Prepare a list of the runtime package names for packages that were >>> + # actually produced >>> + rpkglist = [] >>> + for pkg, rpkg in rpkgnames.iteritems(): >>> + if os.path.exists(os.path.join(pkgdatadir, 'runtime', pkg + >>> '.packaged')): + rpkglist.append((rpkg, pkg)) >>> + rpkglist.sort(key=lambda x: len(x[0]), reverse=True) >>> + >>> + pvu = d.getVar('PV', False) >>> + if '$' + '{SRCPV}' in pvu: >>> + pvprefix = pvu.split('$' + '{SRCPV}', 1)[0] >>> + else: >>> + pvprefix = None >>> + >>> + pkgwritetask = 'package_write_%s' % pkgtype >>> + files = [] >>> + copypkgs = [] >>> + manifest, _ = oe.sstatesig.sstate_get_manifest_filename(pkgwritetask, >>> d) + with open(manifest, 'r') as f: >>> + for line in f: >> >>> + if line.startswith(prepath): >> >> Is this comparing the -packages- or the -files-? I ask, because if you >> process the packages you can short-cut some of the comparisons.. it makes a >> huge improvement in speed. > > Well the manifest for do_package_write is each package file that it writes out, > so it is packages and not their contents (at this point in the code). > >> Specifically, skip processing all '-dev' and '-dbg' packages. (Not >> -staticdev). The -dev and -dbg components need to match the runtime items. >> If the runtime items have no changed then there is no reason that the -dev >> and -dbg items would have changed -- other then timestamps and file paths.. >> which the build-compare filters out anyway. > > For -dbg I guess I was considering any case where debugging symbol generation > had changed in some way; I'm not familiar enough with the format of those > symbols to know whether that could really happen in practice or not. For -dev, > what if someone patches a header alone (would be unusual, but not impossible > to imagine)? dbg symbols should not be able to change (other then file paths which are ignored), otherwise something else has changed and the other pieces of the recipe will know that. As for the -dev, it would be possible to update a header or other component that does not in the end affect the produced binaries. I believe in a released system that will be extraordinarily rare corner case... however it is something that can certainly happen. In our system we decided to handle that corner case through a manual copy, but for OE, only ignoring -dbg should be reasonable. >>> + srcpath = line.rstrip() >>> + if os.path.isfile(srcpath): >>> + destpath = os.path.join(deploydir, >>> os.path.relpath(srcpath, prepath)) + >>> + # This is crude but should work assuming the output >>> + # package file name starts with the package name >>> + # and rpkglist is sorted by length (descending) >>> + pkgbasename = os.path.basename(destpath) >>> + pkgname = None >>> + for rpkg, pkg in rpkglist: >>> + if pkgbasename.startswith(rpkg): >>> + pkgr = pkgrvalues[pkg] >>> + destpathspec = destpath.replace(pkgr, '*') >>> + if pvprefix: >>> + pkgv = pkgvvalues[pkg] >>> + if pkgv.startswith(pvprefix): >>> + pkgvsuffix = pkgv[len(pvprefix):] >>> + if '+' in pkgvsuffix: >>> + newpkgv = pvprefix + '*+' + >>> pkgvsuffix.split('+', 1)[1] + >>> destpathspec = destpathspec.replace(pkgv, newpkgv) + >>> pkgname = pkg >>> + break >>> + else: >>> + bb.warn('Unable to map %s back to package' % >>> pkgbasename) + destpathspec = destpath >>> + >>> + oldfiles = glob.glob(destpathspec) >>> + oldfile = None >>> + docopy = True >>> + if oldfiles: >>> + oldfile = oldfiles[-1] >>> + result = subprocess.call(['pkg-diff.sh', oldfile, >>> srcpath]) + if result == 0: >>> + docopy = False >>> + >> >> If any of the packages differ, then -all- of the packages differ. You can >> drop out of the loop at that point and simply copy the full set of packages >> over at this point. > > Right, that would be a lot simpler. I had imagined there might be an advantage > in terms of reduced on-target upgrading if you'd only for example patched the > source for one or two kernel modules and then rebuilt the kernel, but that > might not justify the extra complexity. We found that typically this just doesn't happen. Between embedded kernel version numbers, built in PN-PV-PR dependencies and other 'safety' mechanisms.. you really do have to publish all or nothing. Large package sets (like the kernel, python, perl) would be a place for optimization.. but I don't think it's a good idea to subset a recipe. It really makes regeneration of what you had very difficult without "repeating history" (i.e. the steps you took to get there) and that makes for a bad build system experience. (Note there is still some of that since the deploy directory and sstate-cache aren't in sync.. but I think it's easier to manage then partial recipes updates.) >> The reason for this is that the packages from a single recipe may (and often >> do) have PN-PV-PR dependencies in them -- so they have to be kept as a >> unit. Also the -dbg package contains the overall debug information for the >> full set of packages produced by a recipe.. so if you change a package the >> -dbg has to be updated -- since the -dbg was updated, everything has to be >> updated. > > That's why I had the versioned dependency check, though it would be much > simpler (and faster) not to do any of that and just keep it all in lock-step. > >> So drop the loop, abort the compares and copy the full manifest set over in >> one shot.. For large packages like perl and python it makes a HUGH >> difference to avoid all of the extra comparisons. >> >> (I also did this because there is some concern in my mind if releasing only >> part of a recipe build ever makes sense.. because I've not gotten into the >> situation where the only way to reproduce the released feed would be to >> build the old version first, then come in and build the new version next.. >> this isn't a typical workflow and greatly complicates things.) > > Right, there is a potential risk (and potential for confusion) there. > >>> + files.append((pkgname, pkgbasename, srcpath, oldfile, >>> destpath)) + bb.debug(2, '%s: package %s %s' % (pn, >>> files[-1], docopy)) + if docopy: >>> + copypkgs.append(pkgname) >>> + >> >> Making the change above would eliminate the need for the code below.. and >> reduce the possibility of things getting out of sync. > > Indeed. > > Cheers, > Paul > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-09-08 15:47 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-09-08 13:41 [RFC PATCH 0/3] WIP: add ability to reduce package feed churn Paul Eggleton 2015-09-08 13:41 ` [RFC PATCH 1/3] classes/sstate: break out function to get sstate manifest filename Paul Eggleton 2015-09-08 13:41 ` [RFC PATCH 2/3] build-compare: add support for examining deb and ipk packages Paul Eggleton 2015-09-08 13:41 ` [RFC PATCH 3/3] WIP: classes/packagefeed-stability: add class to help reduce package feed churn Paul Eggleton 2015-09-08 14:09 ` Mark Hatle 2015-09-08 15:03 ` Paul Eggleton 2015-09-08 15:47 ` Mark Hatle
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox