Re: [docs] [yocto-docs PATCH v2] Add scripts to build the docs in containers

public inbox for docs@lists.yoctoproject.org
 help / color / mirror / Atom feed

From: "Antonin Godard" <antonin.godard@bootlin.com>
To: "Quentin Schulz" <quentin.schulz@cherry.de>,
	<docs@lists.yoctoproject.org>
Cc: "Thomas Petazzoni" <thomas.petazzoni@bootlin.com>
Subject: Re: [docs] [yocto-docs PATCH v2] Add scripts to build the docs in containers
Date: Tue, 10 Dec 2024 15:48:01 +0100	[thread overview]
Message-ID: <D683Q0AOYD75.7XN9KEOCKVTL@bootlin.com> (raw)
In-Reply-To: <2e420969-02db-4116-ac99-38332a3d4b40@cherry.de>

Hi Quentin,

On Tue Dec 10, 2024 at 2:48 PM CET, Quentin Schulz wrote:
> Hi Antonin,
>
> On 12/10/24 1:46 PM, Antonin Godard wrote:
>> Hi Quentin,
>> 
>> On Tue Dec 10, 2024 at 12:08 PM CET, Quentin Schulz wrote:
>>> Hi Antonin,
>>>
>>> On 12/10/24 11:33 AM, Antonin Godard via lists.yoctoproject.org wrote:
>>>> Hi Quentin,
>>>>
>>>> On Fri Dec 6, 2024 at 3:23 PM CET, Quentin Schulz wrote:
>>>>> Hi Antonin,
>>>>>
>>>>> On 12/5/24 12:06 PM, Antonin Godard wrote:
>>>>>> Add two scripts for building a container image and building the
>>>>>> documentation in that image: build-docs-container and
>>>>>> container-build-docs.
>>>>>>
>>>>>
>>>>> Yet there's only one left now :) (also in your commit title).
>>>>
>>>> Oops, will update in v3 :)
>>>>
>>>>>> For now, the documentation/tools/dockerfiles directory contains a
>>>>>> Dockerfile for building with Ubuntu or Debian, but we could extend this
>>>>>> to the supported distros.
>>>>>>
>>>>>> It should be possible to build the full documentation with two commands:
>>>>>>
>>>>>>      ./documentation/tools/build-docs-container ubuntu:24.04
>>>>>>
>>>>>> CONTAINERCMD can be replaced by "podman" to build with podman.
>>>>>>
>>>>>
>>>>> I assume installing podman-docker package could help with that too. Not
>>>>> sure it's recommended, but I do have it installed :)
>>>>
>>>> Didn't know about that one. But it shouldn't be required to use podman or docker
>>>> independently I think?
>>>>
>>>
>>> It's just that when you run `which docker` you would get podman. But we
>>> shouldn't require the user to install podman-docker for the script to
>>> work. Just wanted to mention it.
>>>
>>> [...]
>>>
>>>>>> +SCRIPT_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)
>>>>>> +CONTAINERCMD=${CONTAINERCMD:-docker}
>>>>>> +DOCS_DIR="$SCRIPT_DIR/../.."
>>>>>> +POKY_YAML_IN="$SCRIPT_DIR/../poky.yaml.in"
>>>>>> +
>>>>>> +# This lists the different images we can build and the keys we pass to yq to
>>>>>> +# find packages in poky.yaml.in.
>>>>>> +# The keys should be in the form of "<image>:<version>", as this will be passed
>>>>>> +# to FROM in the selected Dockerfile below.
>>>>>> +# These are common yq keys used for multiple distros.
>>>>>> +_UBUNTU_DEBIAN_YQ_KEYS=".UBUNTU_DEBIAN_HOST_PACKAGES_DOC .UBUNTU_DEBIAN_HOST_PACKAGES_DOC_PDF"
>>>>>> +declare -A YQ_KEYS=(
>>>>>> +  [ubuntu:22.04]="$_UBUNTU_DEBIAN_YQ_KEYS"
>>>>>> +  [ubuntu:24.04]="$_UBUNTU_DEBIAN_YQ_KEYS"
>>>>>> +  [debian:12]="$_UBUNTU_DEBIAN_YQ_KEYS"
>>>>>> +)
>>>>>> +
>>>>>> +# This lists the dockerfile to use for each of the distro listed in YQ_KEYS
>>>>>> +# above. There should be a 1 to 1 match between the keys listed in YQ_KEYS above
>>>>>> +# and the keys listed in DOCKERFILES below.
>>>>>> +declare -A DOCKERFILES=(
>>>>>> +  [ubuntu:22.04]="Dockerfile.ubuntu-debian"
>>>>>> +  [ubuntu:24.04]="Dockerfile.ubuntu-debian"
>>>>>> +  [debian:12]="Dockerfile.ubuntu-debian"
>>>>>> +)
>>>>>> +
>>>>>> +main ()
>>>>>> +{
>>>>>> +  if [ "$#" -lt 1 ]; then
>>>>>> +    echo "No image provided. Provide one of: ${!YQ_KEYS[*]}"
>>>>>> +    exit 1
>>>>>> +  elif [ "$1" = "list" ]; then
>>>>>> +    echo -e "Available container images:\n\n${!YQ_KEYS[*]}"
>>>>>> +    exit 0
>>>>>> +  fi
>>>>>> +
>>>>>> +  local image="$1"
>>>>>> +  shift
>>>>>> +  local make_targets="${*:-publish}"
>>>>>> +
>>>>>> +  for cmd in $CONTAINERCMD yq; do
>>>>>> +    if ! which "$cmd" >/dev/null 2>&1; then
>>>>>> +      echo "The $cmd command was not found. Make sure you have $cmd installed."
>>>>>> +      exit 1
>>>>>> +    fi
>>>>>> +  done
>>>>>> +
>>>>>> +  # Get the appropriate dockerfile from DOCKERFILES
>>>>>> +  dockerfile="${DOCKERFILES[$image]}"
>>>>>> +
>>>>>> +  local temporary_dep_file="$SCRIPT_DIR/dockerfiles/deps"
>>>>>> +  echo -n > "$temporary_dep_file" # empty the file
>>>>>> +  for dep_key in ${YQ_KEYS[$image]}; do
>>>>>> +    yq --raw-output --join-output "$dep_key" "$POKY_YAML_IN" >> "$temporary_dep_file"
>>>>>> +    # add a white space after last element of yq command
>>>>>> +    echo -n " " >> "$temporary_dep_file"
>>>>>> +  done
>>>>>> +
>>>>>
>>>>> Just use a temporary file, this would allow to run builds from two
>>>>> different distros at the same time. mktemp should help you with that.
>>>>
>>>> The problem is that the source in COPY must be part of the build context, so I
>>>> don't having a temp file in /tmp is possible. See
>>>> https://docs.docker.com/reference/dockerfile/#adding-files-from-the-build-context
>>>>
>>>> I could use "mktemp --tmpdir=$SCRIPT_DIR/dockerfiles" but I don't really see the
>>>> added value.
>>>>
>>>
>>> "this would allow to run builds from two different distros at the same time"
>>>
>>> Also, it's guaranteed that an old file wouldn't be used in case the
>>> script is somehow broken.
>> 
>> I see what you mean. However, since mktemp create a random name it creates a
>> new layer each time, invalidating the following commands in the container file.
>> 
>> Instead I propose to move this above:
>> 
>>>>>> +  # docker build doesn't accept 2 colons, so "sanitize" the name
>>>>>> +  local sanitized_dockername
>>>>>> +  sanitized_dockername=$(echo "$image" | tr ':.' '-')
>> 
>> And do this:
>> 
>>    temporary_dep_file="$SCRIPT_DIR/dockerfiles/$sanitized_dockername.deps"
>> 
>> So we avoid the distro clash and we still use cached layers.
>> 
>>>>>> +
>>>>>> +  $CONTAINERCMD build \
>>>>>> +    --tag yocto-docs-$sanitized_dockername:latest \
>>>>>> +    --build-arg ARG_FROM="$image" \
>>>>>> +    --file "$SCRIPT_DIR/dockerfiles/$dockerfile" \
>>>>>> +    "$SCRIPT_DIR/dockerfiles"
>>>>>> +
>>>>>> +  # We can remove the deps file, we no longer need it
>>>>>> +  rm -f "$temporary_dep_file"
>>>>>> +
>>>>>> +  local -a args_run=(
>>>>>> +    --rm
>>>>>> +    --interactive
>>>>>> +    --tty
>>>>>> +    --volume="$DOCS_DIR:/docs:rw"
>>>>>> +    --workdir=/docs
>>>>>> +    --security-opt label=disable
>>>>>> +  )
>>>>>> +
>>>>>> +  if [ "$CONTAINERCMD" = "docker" ]; then
>>>>>> +    args_run+=(
>>>>>> +      --user="$(id -u)":"$(id -g)"
>>>>>> +    )
>>>>>> +  elif [ "$CONTAINERCMD" = "podman" ]; then
>>>>>> +    # we need net access to fetch bitbake terms
>>>>>> +    args_run+=(
>>>>>> +      --cap-add=NET_RAW
>>>>>> +      --userns=keep-id
>>>>>> +    )
>>>>>> +  fi
>>>>>> +
>>>>>> +  $CONTAINERCMD run \
>>>>>> +    "${args_run[@]}" \
>>>>>> +    yocto-docs-$sanitized_dockername \
>>>>>> +    make -C documentation $make_targets
>>>>>> +}
>>>>>> +
>>>>>> +set -eu -o pipefail
>>>>>> +
>>>>>> +main "$@"
>>>>>> diff --git a/documentation/tools/dockerfiles/Dockerfile.ubuntu-debian b/documentation/tools/dockerfiles/Dockerfile.ubuntu-debian
>>>>>> new file mode 100644
>>>>>> index 0000000000000000000000000000000000000000..3c543dc4b0e96dc9c00b201558e2ed00847339fa
>>>>>> --- /dev/null
>>>>>> +++ b/documentation/tools/dockerfiles/Dockerfile.ubuntu-debian
>>>>>> @@ -0,0 +1,18 @@
>>>>>> +ARG ARG_FROM=debian:12 # default value to avoid warning
>>>>>> +FROM $ARG_FROM
>>>>>> +
>>>>>> +ENV DEBIAN_FRONTEND=noninteractive
>>>>>> +
>>>>>> +# relative to the location of the dockerfile
>>>>>> +COPY deps /deps
>>>>>> +
>>>>>> +RUN apt-get update \
>>>>>> + && apt-get --yes --no-install-recommends install $(cat /deps) \
>>>>>> + && apt-get --yes autoremove \
>>>>>> + && apt-get clean \
>>>>>> + && rm /deps
>>>>>> +
>>>>>> +RUN echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen && locale-gen
>>>>>> +
>>>>>> +RUN git config --global --add safe.directory /docs
>>>>>> +
>>>>>>
>>>>>
>>>>> So... I've had another idea.
>>>>>
>>>>> Parsing yaml in shell, not fun.
>>>>>
>>>>> But I don't think we **need** the instructions to be in the YAML file.
>>>>>
>>>>> So I think we could drastically simplify all this by storing the setup
>>>>> instructions in shell scripts which are included in sphinx AND use those
>>>>> shell scripts in the Containerfile. We actually do this for our product
>>>>> user manuals :)
>>>>>
>>>>> Essentially, we would have:
>>>>>
>>>>> $ cat documentation/host-packages-ubuntu.sh
>>>>> sudo apt install gawk wget git diffstat unzip texinfo gcc \ [...]
>>>>>
>>>>> In Sphinx:
>>>>>
>>>>> .. literalinclude:: 50.literalinclude.apt.sh
>>>>>       :language: bash
>>>>>
>>>>> In the appropriate place.
>>>>>
>>>>> How we generate the Containerfile internally:
>>>>> """
>>>>> cat <<EOF > $WORKDIR/merged.sh
>>>>> #!/bin/bash
>>>>> set -eux
>>>>> export DEBIAN_FRONTEND=noninteractive
>>>>> apt-get update
>>>>> apt-get -y install sudo
>>>>> EOF
>>>>>
>>>>> # Merge bash snippets we want to run (in the right order)
>>>>> cat ../source/50.literalinclude.{apt,atf,uboot,linux,debos-prepare,\
>>>>> debos-build-bookworm}.sh >> $WORKDIR/merged.sh
>>>>>
>>>>> chmod +x $WORKDIR/merged.sh
>>>>>
>>>>> podman run --rm --tmpfs=/tmp -v=$WORKDIR:$WORKDIR --security-opt
>>>>> label=disable \  --annotation -w $WORKDIR debian:bookworm ./merged.sh
>>>>> """
>>>>>
>>>>> I assume we could simply use COPY in the Containerfile and include the
>>>>> shell script in there and run the shell script in a single RUN. You
>>>>> could then do the cleanup step in a separate layer, not optimal since
>>>>> the layer will be unnecessarily big but I assume the end image is going
>>>>> to be smaller anyway?
>>>>>
>>>>> The benefit is that you test **exactly** what's in the documentation.
>>>>> And also you don't have to parse YAML. And it should be relatively easy
>>>>> to add support for new distros.
>>>>>
>>>>> What do you think?
>>>>
>>>> I like this idea! Only downside I see is to have to re-run "apt get install" for
>>>> each run command, which is not really ideal IMO. I prefer caching what's already
>>>
>>> No? I have one RUN command which is "execute this shell script".
>>> Whatever is executed as part of this shell script is part of one layer.
>>> You could even have one RUN command which is apt-get update &&
>>> my-shell-script && apt-get cache remove whatever to make this a smaller
>>> layer.
>> 
>> So let me know if I got this wrong, but running "podman run ..." multiple times
>> is going to make use of a cached layer automagically? I.e. merged.sh is not
>> re-run each time? Or you are implying that there is still a Containerfile in the
>> process that caches this data?
>> 
>
> OK, so I actually didn't write what I was thinking about :)
>
> To answer your question, podman run cannot cache what's running in there 
> with the above command, so it's run every time. This is fine for our 
> manual because we do want to run those commands every time and not rely 
> on caching (e.g. because we build the HEAD of some git repo and that 
> **is** desired).
>
> But we could simply generate this merged.sh script before podman build 
> and pass it inside the Containerfile so it makes it as a layer.
>
> Obviously, any change to the shell script would trigger the COPY layer 
> to change and everything after be rebuilt. So the cache will work until 
> the shell script content (mtime actually I believe?) changes.

Okay, we're on the same page, thanks for clarifying!

>>>> been downloaded so we can re-build the doc swiftly (I imagined this script to be
>>>> used for more than verifying that what we list in the dep list is enough to
>>>> build, and used by anyone to test their changes to the docs easily).
>>>>
>>>
>>> You can have as many scripts as you want? And have them do as many or
>>> little things as you want, I'm not sure to follow what you have in mind
>>> here?
>> 
>> That wasn't clear maybe. I simply meant to say that I want to publish this
>> script not only for verifying that listed dependencies are correct, but also
>> because it's convenient for anyone to build the documentation. It's unrelated to
>> your solution.
>> 
>
> I'm working on something right now, will send as RFC so you can comment 
> on it. Should happen today or tomorrow.

Nice! Are you basing things on top of my patch? Or is it separate? I got a v3
ready in any case.

> BTW, I'm unable to build the docs on openSUSE Leap 15.4/15.5/15.6. It 
> seems it's always trying to build from python 3.6 and something breaks 
> in the sphinx part very early:
>
> 0efb6adca777:/docs # make -C documentation/ html
> make: Entering directory '/docs/documentation'
> ./set_versions.py
> Traceback (most recent call last):
>    File "./set_versions.py", line 102, in <module>
>      subprocess.run(["git", "show", "yocto-%s" % 
> release_series[activereleases[0]]], capture_output=True, check=True)
>    File "/usr/lib64/python3.6/subprocess.py", line 423, in run
>      with Popen(*popenargs, **kwargs) as process:
> TypeError: __init__() got an unexpected keyword argument 'capture_output'
> make: *** [Makefile:78: html] Error 1
> make: Leaving directory '/docs/documentation'

As expected, documentation builds aren't really working on all distros :/ Thanks
for reporting this. Hopefully the scripts will help with that.


Antonin

-- 
Antonin Godard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

     prev parent reply	other threads:[~2024-12-10 14:48 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-05 11:06 [yocto-docs PATCH v2] Add scripts to build the docs in containers Antonin Godard
2024-12-06 14:23 ` Quentin Schulz
2024-12-10 10:33   ` Antonin Godard
2024-12-10 11:08     ` [docs] " Quentin Schulz
2024-12-10 12:46       ` Antonin Godard
2024-12-10 13:48         ` Quentin Schulz
2024-12-10 14:48           ` Antonin Godard [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D683Q0AOYD75.7XN9KEOCKVTL@bootlin.com \
    --to=antonin.godard@bootlin.com \
    --cc=docs@lists.yoctoproject.org \
    --cc=quentin.schulz@cherry.de \
    --cc=thomas.petazzoni@bootlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox