Hello! rm_work.bbclass currently isn't good enough for use in really tightly constrained environments. For example, building core-image-sato with 20GB of free disk space should work (final disk usage with rm_work active: <13GB). But in practice, that disk space gets exhausted in the middle of a build because do_rm_work doesn't run often and/or soon enough. I ran into this when building meta-intel-iot-security under TravisCI. I'm now trying to improve what I did to enhance rm_work.bbclass and submit it upstream. It isn't ready yet, so consider this a request for comments. Current code as set of patches on top of poky is here: https://github.com/pohly/poky/commits/rmwork While working on this I wanted to see system resource usage over time. I found that pybootchart has support for drawing such charts, but that code is all disabled because the information isn't collected for a build. I've added that to buildstats.bbclass and also added logging of the disk usage, by hooking into the existing disk monitoring. Collecting this information works best when done at regular time intervals. There's no support for that in current bitbake, so I had to add something (HeartbeatEvent). Data collection is currently enabled by default and cannot be turned off. There is some rate limit, but despite that, for a fast core-image-sato build of 45 minutes, that additional logging adds 14M to the buildstats, an increase of 66%. The overhead is larger for slower builds (information about tasks the same, longer time range -> more system usage information): 127%. Should this data collection be enabled by default when buildstats.bbclass is active? If yes, then I can think of several ways of reducing the overhead (pre-process the information instead of writing raw /proc dumps, on-the-fly compression), but it might not be worthwhile to bother with that if it only gets written when explicitly requested. OTOH, parsing seems a bit slower now, so some further optimizations may be useful either way. Personally I'd prefer to have it enabled by default, because it is useful to have the information after something unusual happens - perhaps it can't be reproduced, or doing so would be slow. If it is enabled by default, the data is small enough and runtime overhead is low, should it be possible to disable it at all when buildstats.bbclass is inherited? How small and low is enough? In the current implementation, disk space usage is logged exactly for those volumes that are monitored already (i.e. BB_DISKMON_DIRS). Does anyone see a need to configure that separately? I personally didn't. Now regarding rm_work.bbclass: the main problem (unfortunately not solved to my own satisfaction) is that the do_rm_work cleanup task depends on do_build, to ensure that all work for the recipe really is done. But that has the effect that foo:do_rm_work also depends on bar:do_build when there is a foo->bar dependency. That's how bitbake ensures that building foo also builds everything needed by it. I tried injecting do_rm_work between the tasks of a recipe and do_build via an anonymous python method. That also allows removing the BB_DEFAULT_TASK modification. The problem with that is determining "the tasks of a recipe", As I suspected, the approach runs into problems when some other classes also use anonymous python code to add tasks. In practice, do_package_write_rpm/ipk/deb ended up getting added after the do_rm_work injection, leading to race conditions. To move on with my testing, I hard-coded those tasks as pre-condition for do_rm_work, but that's just too fragile. To solve this I can think of two solutions: 1. ordering anonymous python methods by priority and establishing some kind of convention which changes are done at which priority - no idea how realistic or hard that is 2. a dedicated hook for rm_work.bbclass in bitbake, similar to calculate_extra_depends in meta-world-pkgdata.bb In addition, I also played with a different scheduler. The "completion" scheduler used by rm_work.bbclass orders tasks like this (most important first): 1. ID /work/poky/meta/recipes-devtools/quilt/quilt-native_0.64.bb:do_fetch 2. ID /work/poky/meta/recipes-devtools/quilt/quilt-native_0.64.bb:do_unpack 3. ID /work/poky/meta/recipes-devtools/quilt/quilt-native_0.64.bb:do_patch 4. ID /work/poky/meta/recipes-devtools/quilt/quilt-native_0.64.bb:do_configure 5. ID /work/poky/meta/recipes-devtools/quilt/quilt-native_0.64.bb:do_compile 6. ID /work/poky/meta/recipes-devtools/quilt/quilt-native_0.64.bb:do_install 7. ID /work/poky/meta/recipes-devtools/quilt/quilt-native_0.64.bb:do_populate_sysroot 8. ID /work/poky/meta/recipes-devtools/quilt/quilt-native_0.64.bb:do_populate_lic 9. ID /work/poky/meta/recipes-devtools/quilt/quilt-native_0.64.bb:do_rm_work 10. ID /work/poky/meta/recipes-devtools/quilt/quilt-native_0.64.bb:do_build 11. ID /work/poky/meta/recipes-extended/texinfo-dummy-native/texinfo-dummy-native.bb:do_fetch 12. ID /work/poky/meta/recipes-extended/texinfo-dummy-native/texinfo-dummy-native.bb:do_unpack 13. ID /work/poky/meta/recipes-extended/texinfo-dummy-native/texinfo-dummy-native.bb:do_patch 14. ID /work/poky/meta/recipes-extended/texinfo-dummy-native/texinfo-dummy-native.bb:do_configure 15. ID /work/poky/meta/recipes-extended/texinfo-dummy-native/texinfo-dummy-native.bb:do_compile 16. ID /work/poky/meta/recipes-extended/texinfo-dummy-native/texinfo-dummy-native.bb:do_install 17. ID /work/poky/meta/recipes-extended/texinfo-dummy-native/texinfo-dummy-native.bb:do_populate_sysroot 18. ID /work/poky/meta/recipes-extended/texinfo-dummy-native/texinfo-dummy-native.bb:do_populate_lic 19. ID /work/poky/meta/recipes-extended/texinfo-dummy-native/texinfo-dummy-native.bb:do_rm_work 20. ID /work/poky/meta/recipes-extended/texinfo-dummy-native/texinfo-dummy-native.bb:do_build 21. ID virtual:native:/work/poky/meta/recipes-devtools/gnu-config/gnu-config_git.bb:do_fetch ... It basically starts with the most important recipe (as determined by a dependency analysis that checks for the number of other recipes directly or indirectly depending on it) and then lists all tasks in that recipe, then continues with the next recipe. My impression was that this can lead to the following undesirable situation: foo:do_rm_work and bar:do_compile are both ready to run, "bar" is more important than "foo" => another compile job gets started instead of cleaning up some disk space first. So I wrote a scheduler which orders first by tasks, starting with the ones that are necessary to complete a recipe. The ordering of recipes is the same as in the speed scheduler, i.e. if normally foo:do_compile comes before bar:do_compile, then so does it with the rmwork scheduler. The result is something like this: 1. ID /work/poky/meta/recipes-support/popt/popt_1.16.bb:do_build 2. ID /work/poky/meta/recipes-core/readline/readline_6.3.bb:do_build 3. ID /work/poky/meta/recipes-connectivity/libnss-mdns/libnss-mdns_0.10.bb:do_build ... 464. ID /work/poky/meta/recipes-sato/images/core-image-sato.bb:do_build 465. ID /work/poky/meta/recipes-graphics/xorg-proto/inputproto_2.3.2.bb:do_rm_work 466. ID /work/poky/meta/recipes-devtools/python/python3_3.5.2.bb:do_rm_work 467. ID /work/poky/meta/recipes-core/packagegroups/packagegroup-base.bb:do_rm_work ... 3620. ID virtual:native:/work/poky/meta/recipes-extended/pbzip2/pbzip2_1.1.13.bb:do_install 3621. ID /work/poky/meta/recipes-devtools/qemu/qemu-helper-native_1.0.bb:do_install 3622. ID /work/poky/meta/recipes-core/zlib/zlib_1.2.8.bb:do_compile_ptest_base 3623. ID /work/poky/meta/recipes-extended/bzip2/bzip2_1.0.6.bb:do_compile_ptest_base ... 3645. ID /work/poky/meta/recipes-support/libevent/libevent_2.0.22.bb:do_compile_ptest_base 3646. ID /work/poky/meta/recipes-core/busybox/busybox_1.24.1.bb:do_compile_ptest_base 3647. ID /work/poky/meta/recipes-kernel/linux/linux-yocto_4.8.bb:do_uboot_mkimage 3648. ID /work/poky/meta/recipes-kernel/linux/linux-yocto_4.8.bb:do_sizecheck 3649. ID /work/poky/meta/recipes-kernel/linux/linux-yocto_4.8.bb:do_strip 3650. ID /work/poky/meta/recipes-kernel/linux/linux-yocto_4.8.bb:do_compile_kernelmodules 3651. ID /work/poky/meta/recipes-kernel/linux/linux-yocto_4.8.bb:do_shared_workdir 3652. ID /work/poky/meta/recipes-kernel/linux/linux-yocto_4.8.bb:do_kernel_link_images 3653. ID /work/poky/meta/recipes-devtools/quilt/quilt-native_0.64.bb:do_compile 3654. ID /work/poky/meta/recipes-extended/texinfo-dummy-native/texinfo-dummy-native.bb:do_compile ... Finally, I also added the possibility to run more of the "light" tasks like do_rm_work in parallel to "heavy" tasks like do_compile. The intention was that if BB_NUM_THREADS is exhausted by active do_compile tasks (which are usually CPU bound), then running some additional do_rm_work tasks in parallel will help to keep disk usage lower and also won't interfere much with on-going compile jobs because file removal is more IO intensive. It might also be beneficial to prepare the next do_compile with do_fetch/unpack/patch/configure. Orthogonal to that, I implemented the possibility to let downloaded sources get removed by do_rm_work by using a per-recipe DL_DIR (see rm_work_and_downloads.bbclass). This trades potentially downloading the same source multiple times against disk usage. It's not ideal and unusable for incremental builds (removing the source currently does not trigger a re-download), but was enough for me under TravisCI. But how well do these different improvements really work, and which ones are worth the additional complexity? I ran benchmarks with building core-image-sato on x86-64 for the default qemux86, on an 8 core + hyperthreading i7-5960X Processor Extreme Edition (20M Cache, up to 3.50 GHz) with 32GB of RAM. Disk space is on a striped RAID array of two conventional disks. Test script attached. I only ran this once, so I can't vouch that the numbers are stable. I remember reading about a benchmark wrapper which can run such a test script multiple times and then will automatically merge the output of all runs together, adding min/max/average/deviation, but couldn't find the tool again. Does anyone know which tool does that? Base build (no rm_work, BB_NUM_THREADS=16): elapsed: 45:38.18 final disk usage: 32538MiB max disk usage: 32475MiB Original rm_work.bbclass: elapsed: 44:20.16 final disk usage: 12875MiB max disk usage: 18740MiB The maximum disk usage here is from the buildstats, computed as delta between lowest observed value and maximum observed value. It is a bit too low because disk monitoring starts after parsing recipes and may miss spikes. I also ran experiments with building in a fixed 20GB partition, and that was not enough, the builds were always aborted by the disk monitor when hitting the 19GB mark. Total build time though is a bit lower, also with the other rm_work configurations. I'm not exactly sure why that is - perhaps the disk cache works better because it can reuse blocks? Attached is the pybootchart of that build. Note how disk space usage and size of the amount of cached pages go down in parallel at the end of the build. new-rmwork (= same scheduler, do_rm_work injected before do_build): elapsed: 44:03.53 final disk usage: 12873MiB max disk usage: 14444MiB That's the major improvement. I did the scheduler changes first, but those were not enough to get a build to complete in a 20GB partition because the do_build dependencies prevented do_rm_work from running. new-rmwork-new-scheduler elapsed: 42:58.54 final disk usage: 12873MiB max disk usage: 14230MiB A bit better in terms of max disk usage than with the completion scheduler, but not by much. My observation is that in practice there aren't that many ready tasks, so often the choice of priorities among them doesn't have such a big influence. The lower bound of the build time is determined by the length of the do_configure->do_compile critical path and it does not matter much when the other tasks run, as long as they run at some point in parallel to that. Perhaps that is different for larger distros. I might redo this benchmark with Ostro OS XT... new-rmwork-new-scheduler-additional-tasks (16 additional "light" tasks allowed): elapsed: 45:20.15 final disk usage: 12876MiB max disk usage: 13838MiB This brings down maximum disk usage down a bit more, but perhaps now the additional tasks do start to interfere with the critical path, because total build time goes up. The implementation is a hack, too (see comments in the code). I'm inclined to just drop this mode. Because TravisCI not only has little disk space, but also few CPUs, I also ran this experiment over night with BB_NUM_THREADS and the entire build pinned to just four CPUs with "taskset -c 4,5,6,7". base elapsed: 1:29:40 final disk usage: 32459MiB max disk usage: 32430MiB original-rmwork elapsed: 1:30:56 final disk usage: 12881MiB max disk usage: 16449MiB new-rmwork elapsed: 1:31:06 final disk usage: 12856MiB max disk usage: 14412MiB new-rmwork-new-scheduler elapsed: 1:31:39 final disk usage: 12860MiB max disk usage: 14423MiB new-rmwork-new-scheduler-additional-tasks elapsed: 1:27:38 final disk usage: 12887MiB max disk usage: 14485MiB This last build is the first time that I ever got more than BB_NUM_THREADS do_compile tasks running in parallel. There wasn't enough parallelism in the build for that when using 16 CPUs. The results are not that different. My overall conclusion still is that injecting do_rm_work is crucial, while the improved scheduler isn't that relevant, at least not in these tests. -- Best Regards, Patrick Ohly The content of this message is my personal opinion only and although I am an employee of Intel, the statements I make here in no way represent Intel's position on the issue, nor am I authorized to speak on behalf of Intel on this matter.