Yocto Project Documentation
 help / color / mirror / Atom feed
From: Michael Opdenacker <michael.opdenacker@bootlin.com>
To: docs@lists.yoctoproject.org
Cc: Michael Opdenacker <michael.opdenacker@bootlin.com>
Subject: [PATCH V3] overview-manual: document hash equivalence
Date: Fri,  7 Jan 2022 20:08:40 +0100	[thread overview]
Message-ID: <20220107190840.784216-1-michael.opdenacker@bootlin.com> (raw)
In-Reply-To: <16C812800894D4AC.11018@lists.yoctoproject.org>

Signed-off-by: Michael Opdenacker <michael.opdenacker@bootlin.com>
---
 documentation/overview-manual/concepts.rst | 126 +++++++++++++++++++++
 1 file changed, 126 insertions(+)

diff --git a/documentation/overview-manual/concepts.rst b/documentation/overview-manual/concepts.rst
index 6f8a3def69..781ba1b070 100644
--- a/documentation/overview-manual/concepts.rst
+++ b/documentation/overview-manual/concepts.rst
@@ -1938,6 +1938,132 @@ another reason why a task-based approach is preferred over a
 recipe-based approach, which would have to install the output from every
 task.
 
+Hash Equivalence
+----------------
+
+The above section explained how BitBake skips the execution of tasks
+which output can already be found in the Shared State cache.
+
+During a build, it may often be the case that the output / result of a task might
+be unchanged despite changes in the task's input values. An example might be
+whitespace changes in some input C code. In project terms, this is what we define
+as "equivalence".
+
+To keep track of such equivalence, BitBake has to manage three hashes
+for each task:
+
+- The *task hash* explained earlier: computed from the recipe metadata,
+  the task code and the task hash values from its dependencies.
+  When changes are made, these task hashes are therefore modified,
+  causing the task to re-execute. The task hashes of tasks depending on this
+  task are therefore modified too, causing the whole dependency
+  chain to re-execute.
+
+- The *output hash*, a new hash computed from the output of Shared State tasks,
+  tasks that save their resulting output to a Shared State tarball.
+  The mapping between the task hash and its output hash is reported
+  to a new *Hash Equivalence* server. This mapping is stored in a database
+  by the server for future reference.
+
+- The *unihash*, a new hash, initially set to the task hash for the task.
+  This is used to track the *unicity* of task output, and we will explain
+  how its value is maintained.
+
+When Hash Equivalence is enabled, BitBake computes the task hash
+for each task by using the unihash of its dependencies, instead
+of their task hash.
+
+Now, imagine that a Shared State task is modified because of a change in
+its code or metadata, or because of a change in its dependencies.
+Since this modifies its task hash, this task will need re-executing.
+Its output hash will therefore be computed again.
+
+Then, the new mapping between the new task hash and its output hash
+will be reported to the Hash Equivalence server. The server will
+let BitBake know whether this output hash is the same as a previously
+reported output hash, for a different task hash.
+
+If the output hash is already known, BitBake will update the task's
+unihash to match the original task hash that generated that output.
+Thanks to this, the depending tasks will keep a previously recorded
+task hash, and BitBake will be able to retrieve their output from
+the Shared State cache, instead of re-executing them. Similarly, the
+output of further downstream tasks can also be retrieved from Shared
+Shate.
+
+If the output hash is unknown, a new entry will be created on the Hash
+Equivalence server, matching the task hash to that output.
+The depending tasks, still having a new task hash because of the
+change, will need to re-execute as expected. The change propagates
+to the depending tasks.
+
+To summarize, when Hash Equivalence is enabled, a change in one of the
+tasks in BitBake's run queue doesn't have to propagate to all the
+downstream tasks that depend on the output of this task, causing a
+full rebuild of such tasks, and so on with the next depending tasks.
+Instead, when the output of this task remains identical to previously
+recorded output, BitBake can safely retrieve all the downstream
+task output from the Shared State cache.
+
+This applies to multiple scenarios:
+
+-  A "trivial" change to a recipe that doesn't impact its generated output,
+   such as whitespace changes, modifications to unused code paths or
+   in the ordering of variables.
+
+-  Shared library updates, for example to fix a security vulnerability.
+   For sure, the programs using such a library should be rebuilt, but
+   their new binaries should remain identical. The corresponding tasks should
+   have a different output hash because of the change in the hash of their
+   library dependency, but thanks to their output being identical, Hash
+   Equivalence will stop the propagation down the dependency chain.
+
+-  Native tool updates. Though the depending tasks should be rebuilt,
+   it's likely that they will generate the same output and be marked
+   as equivalent.
+
+This mechanism is enabled by default in Poky, and is controlled by three
+variables:
+
+-  :term:`bitbake:BB_HASHSERVE`, specifying a local or remote Hash
+   Equivalence server to use.
+
+-  :term:`BB_HASHSERVE_UPSTREAM`, when ``BB_HASHSERVE = "auto"``,
+   allowing to connect the local server to an upstream one.
+
+-  :term:`bitbake:BB_SIGNATURE_HANDLER`, which must be set  to ``OEEquivHash``.
+
+Therefore, the default configuration in Poky corresponds to the
+below settings::
+
+   BB_HASHSERVE = "auto"
+   BB_SIGNATURE_HANDLER = "OEEquivHash"
+
+Rather than starting a local server, another possibility is to rely
+on a Hash Equivalence server on a network, by setting::
+
+   BB_HASHSERVE = "<HOSTNAME>:<PORT>"
+
+.. note::
+
+   The shared Hash Equivalence server needs to be maintained together with the
+   Share State cache. Otherwise, the server could report Shared State hashes
+   that only exist on specific clients.
+
+   We therefore recommend that one Hash Equivalence server be set up to
+   correspond with a given Shared State cache, and to start this server
+   in *read-only mode*, so that it doesn't store equivalences for
+   Shared State caches that are local to clients.
+
+   See the :term:`BB_HASHSERVE` reference for details about starting
+   a Hash Equivalence server.
+
+See the `video <https://www.youtube.com/watch?v=zXEdqGS62Wc>`__
+of Joshua Watt's `Hash Equivalence and Reproducible Builds
+<https://elinux.org/images/3/37/Hash_Equivalence_and_Reproducible_Builds.pdf>`__
+presentation at ELC 2020 for a very synthetic introduction to the
+Hash Equivalence implementation in the Yocto Project.
+
 Automatically Added Runtime Dependencies
 ========================================
 
-- 
2.25.1



       reply	other threads:[~2022-01-07 19:08 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <16C812800894D4AC.11018@lists.yoctoproject.org>
2022-01-07 19:08 ` Michael Opdenacker [this message]
2022-01-10  6:29   ` [docs] [PATCH V3] overview-manual: document hash equivalence Ulrich Ölmann
     [not found]     ` <d0c7cd99-2d79-b109-eb72-f119283832f8@bootlin.com>
2022-01-10 11:25       ` Ulrich Ölmann
     [not found]       ` <23e5e7d4d7436c22e271d52892803187b6988184.camel@rpsys.net>
2022-01-10 13:22         ` Michael Opdenacker
2022-01-11 21:23   ` Joshua Watt
2022-01-13 10:50     ` Michael Opdenacker
2022-01-13 17:17       ` Joshua Watt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220107190840.784216-1-michael.opdenacker@bootlin.com \
    --to=michael.opdenacker@bootlin.com \
    --cc=docs@lists.yoctoproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox