* Over-pruning the sstate cache @ 2016-03-29 12:56 Mike Crowe 2016-03-29 14:11 ` Richard Purdie 0 siblings, 1 reply; 11+ messages in thread From: Mike Crowe @ 2016-03-29 12:56 UTC (permalink / raw) To: openembedded-core 80b3974081c4a8c604e23982a6db8fb32c616058 implies that at least some people are pruning the sstate cache based on file access time. We run incremental and nightly Jenkins jobs that build images for various targets and branches in order to keep the sstate-cache populated. Files are pruned once they haven't been accessed for a few days. This has worked reasonably well for a few years (and the script can be simplified now since the above commit.) Recently we've found that files that are still required are being pruned. This appears to be due to a combination of improvements to oe-core to avoid unnecessary tasks and improvements to our own recipes. These have resulted in it being possible to build an image without requiring the populate_sysroot.tgz files if nothing has changed that needs building. Is there a recommended way to ensure that all the sstate cache files are touched, even those that are not actually required to build the image currently due to task optimisation? The only solution I can come up with is to invent a recursive "all_populate_sysroot" recrdeptask that depends on the individual populate_sysroot tasks and run that task for each image. Does anyone have any better ideas? Mike. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Over-pruning the sstate cache 2016-03-29 12:56 Over-pruning the sstate cache Mike Crowe @ 2016-03-29 14:11 ` Richard Purdie 2016-03-30 13:05 ` Mike Crowe 2016-04-12 18:51 ` Mike Crowe 0 siblings, 2 replies; 11+ messages in thread From: Richard Purdie @ 2016-03-29 14:11 UTC (permalink / raw) To: Mike Crowe, openembedded-core On Tue, 2016-03-29 at 13:56 +0100, Mike Crowe wrote: > 80b3974081c4a8c604e23982a6db8fb32c616058 implies that at least some > people > are pruning the sstate cache based on file access time. > > We run incremental and nightly Jenkins jobs that build images for > various > targets and branches in order to keep the sstate-cache populated. > Files are > pruned once they haven't been accessed for a few days. This has > worked > reasonably well for a few years (and the script can be simplified now > since > the above commit.) > > Recently we've found that files that are still required are being > pruned. > This appears to be due to a combination of improvements to oe-core to > avoid > unnecessary tasks and improvements to our own recipes. These have > resulted > in it being possible to build an image without requiring the > populate_sysroot.tgz files if nothing has changed that needs > building. > > Is there a recommended way to ensure that all the sstate cache files > are > touched, even those that are not actually required to build the image > currently due to task optimisation? > > The only solution I can come up with is to invent a recursive > "all_populate_sysroot" recrdeptask that depends on the individual > populate_sysroot tasks and run that task for each image. > > Does anyone have any better ideas? generate the "locked-sigs" inc file (bitbake XXX -S none) and then with a script touch all the objects listed in that file? Cheers, Richard ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Over-pruning the sstate cache 2016-03-29 14:11 ` Richard Purdie @ 2016-03-30 13:05 ` Mike Crowe 2016-04-12 18:51 ` Mike Crowe 1 sibling, 0 replies; 11+ messages in thread From: Mike Crowe @ 2016-03-30 13:05 UTC (permalink / raw) To: Richard Purdie, openembedded-core On Tuesday 29 March 2016 at 15:11:10 +0100, Richard Purdie wrote: > On Tue, 2016-03-29 at 13:56 +0100, Mike Crowe wrote: > > Is there a recommended way to ensure that all the sstate cache files > > are > > touched, even those that are not actually required to build the image > > currently due to task optimisation? > generate the "locked-sigs" inc file (bitbake XXX -S none) and then with > a script touch all the objects listed in that file? That looks like it might work. There's a bit of messing about required for the distribution name in the native packages but I can probably deal with that. Thanks. Mike. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Over-pruning the sstate cache 2016-03-29 14:11 ` Richard Purdie 2016-03-30 13:05 ` Mike Crowe @ 2016-04-12 18:51 ` Mike Crowe 2016-04-12 20:50 ` Richard Purdie 1 sibling, 1 reply; 11+ messages in thread From: Mike Crowe @ 2016-04-12 18:51 UTC (permalink / raw) To: Richard Purdie; +Cc: Mike Crowe, openembedded-core On Tuesday 29 March 2016 at 15:11:10 +0100, Richard Purdie wrote: > On Tue, 2016-03-29 at 13:56 +0100, Mike Crowe wrote: > > 80b3974081c4a8c604e23982a6db8fb32c616058 implies that at least some > > people > > are pruning the sstate cache based on file access time. > > > > Is there a recommended way to ensure that all the sstate cache files > > are > > touched, even those that are not actually required to build the image > > currently due to task optimisation? > > > > Does anyone have any better ideas? > > generate the "locked-sigs" inc file (bitbake XXX -S none) and then with > a script touch all the objects listed in that file? I'm most of the way through writing a script to do this. I've discovered that the sstate filenames contain bits that aren't in the locked-sigs file such as ${PV}, ${PR}, ${TARGET_VENDOR}, ${TARGET_OS}, ${SSTATE_VERSION}. The hash is the important bit for identifying the file uniquely so these bits can either be hard coded or wildcarded as appropriate. There's also the need to apply native OS name prefix for native packages. Is there a a way of getting hold of those bits so I can avoid the wildcards? Thanks. Mike. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Over-pruning the sstate cache 2016-04-12 18:51 ` Mike Crowe @ 2016-04-12 20:50 ` Richard Purdie 2016-04-13 13:47 ` Otavio Salvador 0 siblings, 1 reply; 11+ messages in thread From: Richard Purdie @ 2016-04-12 20:50 UTC (permalink / raw) To: Mike Crowe; +Cc: openembedded-core On Tue, 2016-04-12 at 19:51 +0100, Mike Crowe wrote: > On Tuesday 29 March 2016 at 15:11:10 +0100, Richard Purdie wrote: > > On Tue, 2016-03-29 at 13:56 +0100, Mike Crowe wrote: > > > 80b3974081c4a8c604e23982a6db8fb32c616058 implies that at least > > > some > > > people > > > are pruning the sstate cache based on file access time. > > > > > > Is there a recommended way to ensure that all the sstate cache > > > files > > > are > > > touched, even those that are not actually required to build the > > > image > > > currently due to task optimisation? > > > > > > Does anyone have any better ideas? > > > > generate the "locked-sigs" inc file (bitbake XXX -S none) and then > > with > > a script touch all the objects listed in that file? > > I'm most of the way through writing a script to do this. I've > discovered > that the sstate filenames contain bits that aren't in the locked-sigs > file > such as ${PV}, ${PR}, ${TARGET_VENDOR}, ${TARGET_OS}, > ${SSTATE_VERSION}. > The hash is the important bit for identifying the file uniquely so > these > bits can either be hard coded or wildcarded as appropriate. > > There's also the need to apply native OS name prefix for native > packages. > > Is there a a way of getting hold of those bits so I can avoid the > wildcards? In theory the key part is the hash, all the other pieces are there just to make human understandable filenames/layout (and would be encoded into the hash in most cases). Whilst we could generate that info, I'm not sure it would help much since the hashes should uniquely identify the files? Cheers, Richard ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Over-pruning the sstate cache 2016-04-12 20:50 ` Richard Purdie @ 2016-04-13 13:47 ` Otavio Salvador 2016-04-13 14:01 ` Mike Crowe 0 siblings, 1 reply; 11+ messages in thread From: Otavio Salvador @ 2016-04-13 13:47 UTC (permalink / raw) To: Richard Purdie Cc: Mike Crowe, Patches and discussions about the oe-core layer On Tue, Apr 12, 2016 at 5:50 PM, Richard Purdie <richard.purdie@linuxfoundation.org> wrote: > On Tue, 2016-04-12 at 19:51 +0100, Mike Crowe wrote: >> On Tuesday 29 March 2016 at 15:11:10 +0100, Richard Purdie wrote: >> > On Tue, 2016-03-29 at 13:56 +0100, Mike Crowe wrote: >> > > 80b3974081c4a8c604e23982a6db8fb32c616058 implies that at least >> > > some >> > > people >> > > are pruning the sstate cache based on file access time. >> > > >> > > Is there a recommended way to ensure that all the sstate cache >> > > files >> > > are >> > > touched, even those that are not actually required to build the >> > > image >> > > currently due to task optimisation? >> > > >> > > Does anyone have any better ideas? >> > >> > generate the "locked-sigs" inc file (bitbake XXX -S none) and then >> > with >> > a script touch all the objects listed in that file? >> >> I'm most of the way through writing a script to do this. I've >> discovered >> that the sstate filenames contain bits that aren't in the locked-sigs >> file >> such as ${PV}, ${PR}, ${TARGET_VENDOR}, ${TARGET_OS}, >> ${SSTATE_VERSION}. >> The hash is the important bit for identifying the file uniquely so >> these >> bits can either be hard coded or wildcarded as appropriate. >> >> There's also the need to apply native OS name prefix for native >> packages. >> >> Is there a a way of getting hold of those bits so I can avoid the >> wildcards? > > In theory the key part is the hash, all the other pieces are there just > to make human understandable filenames/layout (and would be encoded > into the hash in most cases). Whilst we could generate that info, I'm > not sure it would help much since the hashes should uniquely identify > the files? Couldn't this to be done, similar to the fetchall task? -- Otavio Salvador O.S. Systems http://www.ossystems.com.br http://code.ossystems.com.br Mobile: +55 (53) 9981-7854 Mobile: +1 (347) 903-9750 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Over-pruning the sstate cache 2016-04-13 13:47 ` Otavio Salvador @ 2016-04-13 14:01 ` Mike Crowe 2016-04-13 14:11 ` Otavio Salvador 0 siblings, 1 reply; 11+ messages in thread From: Mike Crowe @ 2016-04-13 14:01 UTC (permalink / raw) To: Otavio Salvador; +Cc: Patches and discussions about the oe-core layer On Wednesday 13 April 2016 at 10:47:13 -0300, Otavio Salvador wrote: > On Tue, Apr 12, 2016 at 5:50 PM, Richard Purdie > <richard.purdie@linuxfoundation.org> wrote: > > On Tue, 2016-04-12 at 19:51 +0100, Mike Crowe wrote: > >> On Tuesday 29 March 2016 at 15:11:10 +0100, Richard Purdie wrote: > >> > On Tue, 2016-03-29 at 13:56 +0100, Mike Crowe wrote: > >> > > 80b3974081c4a8c604e23982a6db8fb32c616058 implies that at least > >> > > some > >> > > people > >> > > are pruning the sstate cache based on file access time. > >> > > > >> > > Is there a recommended way to ensure that all the sstate cache > >> > > files > >> > > are > >> > > touched, even those that are not actually required to build the > >> > > image > >> > > currently due to task optimisation? > >> > > > >> > > Does anyone have any better ideas? > >> > > >> > generate the "locked-sigs" inc file (bitbake XXX -S none) and then > >> > with > >> > a script touch all the objects listed in that file? > >> > >> I'm most of the way through writing a script to do this. I've > >> discovered > >> that the sstate filenames contain bits that aren't in the locked-sigs > >> file > >> such as ${PV}, ${PR}, ${TARGET_VENDOR}, ${TARGET_OS}, > >> ${SSTATE_VERSION}. > >> The hash is the important bit for identifying the file uniquely so > >> these > >> bits can either be hard coded or wildcarded as appropriate. > >> > >> There's also the need to apply native OS name prefix for native > >> packages. > >> > >> Is there a a way of getting hold of those bits so I can avoid the > >> wildcards? > > > > In theory the key part is the hash, all the other pieces are there just > > to make human understandable filenames/layout (and would be encoded > > into the hash in most cases). Whilst we could generate that info, I'm > > not sure it would help much since the hashes should uniquely identify > > the files? > > Couldn't this to be done, similar to the fetchall task? That's the sort of thing I was thinking of with my "all_populate_sysroot" task in my original question. But, I've a working script using Richard's method now. I'll share it once I've tested it a bit more. Mike. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Over-pruning the sstate cache 2016-04-13 14:01 ` Mike Crowe @ 2016-04-13 14:11 ` Otavio Salvador 2016-04-13 15:27 ` Mike Crowe 2016-04-13 21:33 ` Paul Eggleton 0 siblings, 2 replies; 11+ messages in thread From: Otavio Salvador @ 2016-04-13 14:11 UTC (permalink / raw) To: Mike Crowe; +Cc: Patches and discussions about the oe-core layer On Wed, Apr 13, 2016 at 11:01 AM, Mike Crowe <mac@mcrowe.com> wrote: > On Wednesday 13 April 2016 at 10:47:13 -0300, Otavio Salvador wrote: >> Couldn't this to be done, similar to the fetchall task? > > That's the sort of thing I was thinking of with my "all_populate_sysroot" > task in my original question. > > But, I've a working script using Richard's method now. I'll share it once > I've tested it a bit more. Having it as a task allows it to avoid races and like, when running inside of the BitBake. It also can reuse the metadata database information and ought to be faster. Once you share the script I will see if I can rework it to a task and see how it looks. -- Otavio Salvador O.S. Systems http://www.ossystems.com.br http://code.ossystems.com.br Mobile: +55 (53) 9981-7854 Mobile: +1 (347) 903-9750 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Over-pruning the sstate cache 2016-04-13 14:11 ` Otavio Salvador @ 2016-04-13 15:27 ` Mike Crowe 2016-04-13 21:33 ` Paul Eggleton 1 sibling, 0 replies; 11+ messages in thread From: Mike Crowe @ 2016-04-13 15:27 UTC (permalink / raw) To: Otavio Salvador Cc: Mike Crowe, Patches and discussions about the oe-core layer [-- Attachment #1: Type: text/plain, Size: 1150 bytes --] On Wednesday 13 April 2016 at 11:11:11 -0300, Otavio Salvador wrote: > On Wed, Apr 13, 2016 at 11:01 AM, Mike Crowe <mac@mcrowe.com> wrote: > > On Wednesday 13 April 2016 at 10:47:13 -0300, Otavio Salvador wrote: > >> Couldn't this to be done, similar to the fetchall task? > > > > That's the sort of thing I was thinking of with my "all_populate_sysroot" > > task in my original question. > > > > But, I've a working script using Richard's method now. I'll share it once > > I've tested it a bit more. > > Having it as a task allows it to avoid races and like, when running > inside of the BitBake. It also can reuse the metadata database > information and ought to be faster. Once you share the script I will > see if I can rework it to a task and see how it looks. I'm not really sure how much my (not very pretty) script will have in common with a task-based solution. I agree that integrating it into Bitbake would be better though. At the moment generating the locked-sigs.inc file is by far the slowest part of the operation though. Since you've requested it, here's the not-really-tested-properly script. Mike. [-- Attachment #2: touch-sstate --] [-- Type: text/plain, Size: 3657 bytes --] #!/usr/bin/python2 # # touch-sstate # # Update atime on sstate-cache files for locked-sigs.inc file. # # Mike Crowe <mac@mcrowe.com> # # If nothing needs recompiling when building an image then bitbake # won't access the sstate files that are required to populate the # sysroot. This means that they might be expired as being unused. To # avoid the problem this script processes the "locked-sigs.inc" file # generated by running "bitbake XXX --dump-signatures=none". import glob import os import re import sys import time # These variables may need tweaking when the build environment changes native_lsb_string = "Debian-8" target_vendor = "" target_os = "oe-linux-gnueabi" host_os = "linux" sstate_version = "3" native_arches = [ 'x86-64' ] sstate_cache_dir = "sstate-cache/" # The start of a variable declaration that tells us the arch variable_re = re.compile('^SIGGEN_LOCKEDSIGS_t-([^ ]*) = \"\\\\$') # A single hash for a task sig_re = re.compile('^ *([^:]+):([^:]+):([^:]+) \\\\$') # Ignore the trailing quote, the final line and blank lines ignore_re = re.compile('^( *\")|(SIGGEN_LOCKEDSIGS_TYPES_)|$') arch1 = "noarch" arch2 = "noarch" def TouchMatches(pattern, deadline): matches = glob.glob(pattern) for match in matches: # It's likely that we won't need to touch the file so it's # cheaper to do the stat to check first than to always open # the file for reading. The inode might be in cache but the # file contents not. if os.stat(match).st_atime < deadline: print match # We can't use os.utime for this since we may not own the # file. f = open(match, 'r') # We actually have to read something from the file in # order to update atime. f.read(1) f.close() # In order to avoid unnecessarily thrashing the disk we only want to # modify the atime of the file if the atime is more than a day ago. # This matches the behaviour of the "relatime" mount option. We allow # a bit of leeway in case we take a long time to run and the expiry # script wants to expire stuff that's only more than a day old though. deadline = time.time() - 23 * 3600 line_number = 0 for line in open("locked-sigs.inc", "r"): line_number += 1 line = line.rstrip() variable_match = variable_re.match(line) if variable_match: arch = variable_match.group(1) arch_underscore = arch.replace("-", "_") if arch in native_arches: dir = sstate_cache_dir + native_lsb_string + "/" arch1 = arch_underscore + "-" + host_os arch2 = arch_underscore else: dir = sstate_cache_dir arch1 = arch_underscore + target_vendor + "-" + target_os arch2 = arch_underscore else: sig_match = sig_re.match(line) if sig_match: pn = sig_match.group(1) task = sig_match.group(2)[3:] sig = sig_match.group(3) arch_file = dir + sig[0:2] + "/sstate:" + pn + ":" + arch1 + ":*:*:" + arch2 + ":" + sstate_version + ":" + sig + "_" + task + ".tgz" noarch_file = dir + sig[0:2] + "/sstate:" + pn + "::*:*::" + sstate_version + ":" + sig + "_" + task + ".tgz" TouchMatches(arch_file, deadline) TouchMatches(noarch_file, deadline) elif ignore_re.match(line): pass else: sys.stderr.write('touch-sstate: Unparsed line %d\n' % line_number) sys.exit(1) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Over-pruning the sstate cache 2016-04-13 14:11 ` Otavio Salvador 2016-04-13 15:27 ` Mike Crowe @ 2016-04-13 21:33 ` Paul Eggleton 2016-04-13 21:59 ` Richard Purdie 1 sibling, 1 reply; 11+ messages in thread From: Paul Eggleton @ 2016-04-13 21:33 UTC (permalink / raw) To: Mike Crowe; +Cc: Otavio Salvador, openembedded-core On Wed, 13 Apr 2016 11:11:11 Otavio Salvador wrote: > On Wed, Apr 13, 2016 at 11:01 AM, Mike Crowe <mac@mcrowe.com> wrote: > > On Wednesday 13 April 2016 at 10:47:13 -0300, Otavio Salvador wrote: > >> Couldn't this to be done, similar to the fetchall task? > > > > That's the sort of thing I was thinking of with my "all_populate_sysroot" > > task in my original question. > > > > But, I've a working script using Richard's method now. I'll share it once > > I've tested it a bit more. > > Having it as a task allows it to avoid races and like, when running > inside of the BitBake. It also can reuse the metadata database > information and ought to be faster. Once you share the script I will > see if I can rework it to a task and see how it looks. FWIW there's a bug open to implement this as an option to bitbake: https://bugzilla.yoctoproject.org/show_bug.cgi?id=7733 It's currently assigned to me and was targeted for 2.0; at this stage it'll be 2.1. If there's interest in having this implemented we can certainly increase the priority, since up to this point it was just an idea I had. Cheers, Paul -- Paul Eggleton Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Over-pruning the sstate cache 2016-04-13 21:33 ` Paul Eggleton @ 2016-04-13 21:59 ` Richard Purdie 0 siblings, 0 replies; 11+ messages in thread From: Richard Purdie @ 2016-04-13 21:59 UTC (permalink / raw) To: Paul Eggleton, Mike Crowe; +Cc: Otavio Salvador, openembedded-core On Thu, 2016-04-14 at 09:33 +1200, Paul Eggleton wrote: > On Wed, 13 Apr 2016 11:11:11 Otavio Salvador wrote: > > On Wed, Apr 13, 2016 at 11:01 AM, Mike Crowe <mac@mcrowe.com> > > wrote: > > > On Wednesday 13 April 2016 at 10:47:13 -0300, Otavio Salvador > > > wrote: > > > > Couldn't this to be done, similar to the fetchall task? > > > > > > That's the sort of thing I was thinking of with my > > > "all_populate_sysroot" > > > task in my original question. > > > > > > But, I've a working script using Richard's method now. I'll share > > > it once > > > I've tested it a bit more. > > > > Having it as a task allows it to avoid races and like, when running > > inside of the BitBake. It also can reuse the metadata database > > information and ought to be faster. Once you share the script I > > will > > see if I can rework it to a task and see how it looks. > > FWIW there's a bug open to implement this as an option to bitbake: > > https://bugzilla.yoctoproject.org/show_bug.cgi?id=7733 > > It's currently assigned to me and was targeted for 2.0; at this stage > it'll be > 2.1. If there's interest in having this implemented we can certainly > increase > the priority, since up to this point it was just an idea I had. Whilst I can see the attraction of Paul's bug for some of the existing tasks we have, I'm not seeing this as an easy way to fix the timestamp/touch problem you're mentioning. Partly that is because it would end up running all the tasks in question rather than simply knowing what their hash was and then touching the appropriate files. I think a recursive task based solution would get very very messy and be hard to make work right. What you'd probably have to do is either have a specific "touch sstate" task which then used the sstate internal functions to calculate the sstate filenames for all sstate tasks in a given recipe and then ran the touch operation. I guess something like that would then benefit from Paul's recursive operation. bitbake -S is slow as it basically involves a recipe reparse. If you cache is cold, it will end up with two parses, one initial parse, then the parse which actually does the work. Offhand, I'm not sure if we could generate the locked sigs data from the data in memory or not. I know we can't generate the sigdata/siginfo files from that alone. Just looking at the code, the code is so slow as it reparses without parallel process usage. I think this is so the dumpsigs code has all the data it needs in the current process rather than spread over multiple processes with no way to easily transfer the amount of data involved. Definitely something we should look to fix now that -S is being more commonly used. Cheers, Richard ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-04-13 21:59 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-29 12:56 Over-pruning the sstate cache Mike Crowe 2016-03-29 14:11 ` Richard Purdie 2016-03-30 13:05 ` Mike Crowe 2016-04-12 18:51 ` Mike Crowe 2016-04-12 20:50 ` Richard Purdie 2016-04-13 13:47 ` Otavio Salvador 2016-04-13 14:01 ` Mike Crowe 2016-04-13 14:11 ` Otavio Salvador 2016-04-13 15:27 ` Mike Crowe 2016-04-13 21:33 ` Paul Eggleton 2016-04-13 21:59 ` Richard Purdie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox