Yocto Technical Team Minutes/Engineering Sync for Feb 23, 2021

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Trevor Woerner" <twoerner@gmail.com>
To: yocto@lists.yoctoproject.org
Subject: Yocto Technical Team Minutes/Engineering Sync for Feb 23, 2021
Date: Thu, 25 Feb 2021 09:23:03 -0500	[thread overview]
Message-ID: <20210225142303.GA26834@localhost> (raw)

Yocto Technical Team Minutes, Engineering Sync, for Feb 23, 2021
archive: https://docs.google.com/document/d/1ly8nyhO14kDNnFcW2QskANXW3ZT7QwKC5wWVDg9dDH4/edit

== disclaimer ==
Best efforts are made to ensure the below is accurate and valid. However,
errors sometimes happen. If any errors or omissions are found, please feel
free to reply to this email with any corrections.

== attendees ==
Trevor Woerner, Stephen Jolley, Scott Murray, Armin Kuster, Michael
Halstead, Steve Sakoman, Richard Purdie, Randy MacLeod, Saul Wold, Jon
Mason, Joshua Watt, Paul Barker, Tim Orling, Mark Morton, John Kaldas,
Alejandro H, Ross Burton

== notes ==
- 3.2.2 passed QA clean, awaiting final approval from TSC
- 3.1.6 built and in QA
- 1 week before -m3 should be built (feature freeze for 3.3)
- adding RPM to reproducibility, still needs some work
- recipe maintainers: please review the patches we’re carrying (push
  upstream as many as possible)
- glibc 2.33 issue should be resolved with latest pseudo
- AUH patches are now merged or queued, few of the failures handled
- AB issue list is at record high (not good)

== general ==
RP: can’t get stable test results out of AB on master


RP: would be nice to get RPM reproducibility issues ironed out, but there are
    some epoch issues to work through which messes up diffoscope
RP: was surprised to see how bad the interactive response is on the cmdline on
    the builders. it seems like an I/O bottleneck
Randy: mostly I/O to SSDs?
RP: i believe so
RP: it was immediately after a build had been started. so it could be related
    to downloads or sstate fetching. how much sstate did you expect that build
    to be reusing?
SteveS: version bumps to conman, kernel… yea that could lead to a lot of
    rebuilds
Michael: we’ve been optimizing for throughput for a while. on some other
    build systems we leave some overhead available for cmdline interactivity.
    should we start to do that with the YP AB?
RP: i think it would have to be backed off by a significant amount to get that
    breathing space. so maybe yes, but we’d have to look at it to see what
    to backoff and by how much. looking through the build i see that 77% was
    pulled from sstate (therefore pulling data off the NAS, then extracting
    it).
Randy: and that’s not coordinated at all, if there are 100 items, then 100
    threads?
RP: but limited by BB_THREADS
JPEW: run by buildbot? maybe we could use cgroups?
RP: i don’t think it’s CPU bound, the CPU was 50% idle when the cmdline
    was very slow
MichaelH: sometimes we see that when the system isn’t healthy, i wonder if
    it’s isolated to specific machines?
RP: on the CentOS machine, a command took over 5 minutes to complete. then
    tried debian, same thing. then logged into the fedora machine and was
    able to do stuff. but it didn’t seem isolated to any machines, it
    seemed localized in time (i.e. right after a build had been started),
    then dropped off. so i feel that it might be related to the initial build
    startup, probably related to sstate pulling/extraction
Randy: could also limit I/O using cgroups
RP: we do use IOnice for parts of the build (2.1 and 2.7)
Michael: translation: class 2 priority 1; class 2 priority 7
Alejandro: are these sharing any hardware
RP: they’re all connected to the NAS
Michael: and they’re 100% dedicated to this work
RP: i don’t think this is a network bottleneck, i think this is sstate extraction
JPEW: maybe a different compression algorithm? gzip is notoriously slow
RP: wouldn’t that make it worse?
JPEW: does each AB have their own mirror
RP: it’s all done over NFS
JPEW: network bandwidth should be lower than local unzipping/extraction bandwidth?
Alejandro: could we try different I/O scheduling?
RP: don’t know

RP: had a look at patches. 1,300 patches in oe-core, ~600 in pending the rest
    are submitted or not appropriate. some of these are 10 years or older,
    do we still need them? i sent 2 upstream and was told it wasn’t needed
    anymore (problem fixed in other ways). there’s also one in valgrind that
    looks similar (different fix upstream) and not needed.
Ross: if some people could try to do 1 a day that would be a huge help
RP: lots of patches related to reproducibility
JPEW: the big issue with perf is that it uses bison (which needs patches)

PaulB: read-only mode for PR server. i’ve been working on it, but it’s 1
    big patch. there’s code to handle daemonizing and forking which in hash
    server is using the python mutli-processing. we also want to use the
    same RPC mechanisms that python is using. are those good lines along which
    to break the patches down?
RP: that sounds perfect
PaulB: it was easier to bash on it all together, then go back and break it up
    into digestible chunks
RP: it’s 10 year old code, so i’m not surprised
PaulB: i’ve broken out a part that uses JSON RPC, then use that for the
    server
RP: sounds good to me
JPEW: me too
RP: scaling that code under python2 was a challenge. glad to see this moving
    forward

RP: Randy posted rust patch set. felt it couldn’t be merged in this form
    (too many patches)
Randy: do you want the history squashed?
RP: that was my feeling
Randy: i’ve been working on it bit by bit as stuff happens upstream which
    leads to lots of little commits. but i can reorg by logical group and
    squash the log
RP: yes. in one case there were lots of commits to the rust version, then in
    the end you end up with 2
Randy: someone from MS worked on getting the sdk stuff working
RP: given that next Monday is the feature freeze, let’s get the patched out
    sooner, and worry about the sdk later
Randy: ok. last remaining issue is the pre-fetcher but i don’t know much
    about it. looked at PaulB’s patches
PaulB: there are 3 methods floating around, i’ve focused on one of them that
    i like
1. doing the download ahead of time in do_fetch()
2. let rust-bin do the downloads itself in do_compile() which i don’t like
3. haven’t looked at the last one yet
PaulB: i like 1 because it asks rust to output a cargo which the fetcher can
    then act on
Randy: doesn’t rely on crates?
PaulB: i think it relies on crates for things that it can’t resolve. however
    Andreas’ approach relies on getting bitbake to understand cargo-toml
    file, not sure if that’s a good approach
Randy: are there any lessons with Go that we can use?
Scott: Bruce would be a good one to talk to
PaulB: my understanding with Go is that the code tends to all be placed
    together in the git repository, so the fetch side is a little simpler
Randy: so given the approach we’re using is there anything that needs to be
    added
PaulB: it needs testing
Randy: i have a team working on testing the rust compiler itself. they can
    successfully execute 2/3 of the tests now (of which 99.9% pass). i have
    a reproducibility test for rust hello world, but it takes a long time to
    run. any tests you’re thinking of?
PaulB: fetcher tests. if you have a “crate://” in a URL, just making sure
    it gets translated correctly to make sure it doesn’t bitrot
Randy: is that an -m3 or -m4 activity
RP: if we’ll get it in -m4 for sure we can wait until then. in oe-core
    we’d want some sort of hello world (make sure compiler works and we can
    run the binaries)
Randy: we have that already
RP: both for cross-compiling and target. then reproducibility tests. i’m
    happy to build them up, as long as there’s a roadmap. for -m3 i think we
    should get the baseline rust set and the crate fetcher
PaulB: crate fetcher overrides the wget fetcher and makes sure everything gets
    put in the right place. so it just needs a couple test cases; a map of
    inputs to outputs. i’ll resubmit the patch and include a list of tests
    that we need to add
RP: if someone could reply to Andreas and let him know what’s going on and
    why we’re going in a slightly different direction than the work he’s
    submitted
PaulB: the fundamental unit is the recipe. devtool is the place for some of
    the functionality not bitbake
Randy: building rust hello world works for all glibc qemu targets but some
    breakage with musl (risc-v and powerpc) i think Khem is working on the
    risc-v one. will that hold things up?
RP: no
PaulB: i have some slightly larger rust packages (larger than hello world)
    that i think will test things a little more thoroughly, e.g. ripgrep
Randy: we’re testing that one already. should we add it to oe-core?
RP: it would be good to have something in oe-core to do testing
Randy: i think hello world would be good enough for oe-core and leave larger
    tests for other layers
PaulB: there are rust things in oe-core (librsvg, etc) so i think oe-core will
    have good test things already in them without having to add recipes just
    for testing’s sake
Randy: things also seem to work well on ARM builders
Ross: yes, things are on-target

TrevorW: started work on 2021 YP conference. conversation moved to
    conferences@lists.yoctoproject.org if you want to follow along or help

RP: fetch, workdir, can’t clean up workdir, config changes but can’t
    cleanup. maybe we should fetch to a special dir and then just symlink
PaulB: would it be recipe-specific
RP: yes, it would be under $WORKDIR
PaulB: would make the archiver easier
TimO: there are a number of Go modules that don’t cleanup properly so maybe
    this would help
ScottM: there are lots recipes that do post-processing on files in $WORKDIR
    before moving them to the artifacts directory, so there could be breakage
    there
PaulB: can we do it first thing next release
RP: we’ll give it a try and see
TrevorW: i think a lot of BSP layers will be affected
RP: i think there’s a lot of chance a lot of things (not just BSPs) will be
    affected
ScottM: there are some BSP things that will be affected, but in AGL we’re
    doing a lot of $WORKDIR manipulations that aren’t necessarily BSP
    related as well
(several): overall it sounds like a good idea and a good cleanup to try

                 reply	other threads:[~2021-02-25 14:23 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210225142303.GA26834@localhost \
    --to=twoerner@gmail.com \
    --cc=yocto@lists.yoctoproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.