* [Buildroot] [PATCH v2 2/5] autobuild-run: make prepare_build() clean the output directory used for reproducibility testing
2019-08-10 3:58 [Buildroot] [PATCH v2 1/5] autobuild-run: use different output directories for reproducible builds testing Atharva Lele
@ 2019-08-10 3:58 ` Atharva Lele
2019-08-11 13:39 ` Thomas Petazzoni
2019-08-10 3:59 ` [Buildroot] [PATCH v2 3/5] autobuild-run: make diffoscope output a JSON-formatted file as well Atharva Lele
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Atharva Lele @ 2019-08-10 3:58 UTC (permalink / raw)
To: buildroot
The previous patch introduced a new output directory for the second build in a
reproducibile build test. This patch will make sure that it is cleaned before
the subsequent build process begins.
Signed-off-by: Atharva Lele <itsatharva@gmail.com>
---
scripts/autobuild-run | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/scripts/autobuild-run b/scripts/autobuild-run
index 21eb47a..69766b2 100755
--- a/scripts/autobuild-run
+++ b/scripts/autobuild-run
@@ -383,6 +383,10 @@ class Builder:
# shutil.rmtree doesn't remove write-protected files
subprocess.call(["rm", "-rf", self.outputdir])
os.mkdir(self.outputdir)
+
+ # If it exists, remove the other output directory used for reproducibility testing
+ if os.path.exists(self.outputdir_2):
+ subprocess.call(["rm", "-rf", self.outputdir_2])
with open(os.path.join(self.outputdir, "branch"), "w") as branchf:
branchf.write(branch)
--
2.22.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [Buildroot] [PATCH v2 3/5] autobuild-run: make diffoscope output a JSON-formatted file as well
2019-08-10 3:58 [Buildroot] [PATCH v2 1/5] autobuild-run: use different output directories for reproducible builds testing Atharva Lele
2019-08-10 3:58 ` [Buildroot] [PATCH v2 2/5] autobuild-run: make prepare_build() clean the output directory used for reproducibility testing Atharva Lele
@ 2019-08-10 3:59 ` Atharva Lele
2019-08-11 13:41 ` Thomas Petazzoni
2019-08-10 3:59 ` [Buildroot] [PATCH v2 4/5] autobuild-run: initial implementation of get_reproducibility_failure_reason() Atharva Lele
` (2 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Atharva Lele @ 2019-08-10 3:59 UTC (permalink / raw)
To: buildroot
Normal diffoscope output is readable by humans but not really
convenient when working with it in code. JSON can be easily
read and extracted information from.
We still keep the normal text version because it's easy to parse by a human,
but we limit its size to 40KB.
Signed-off-by: Atharva Lele <itsatharva@gmail.com>
---
Changes v1 -> v2:
- Save text output as well as JSON
---
scripts/autobuild-run | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/scripts/autobuild-run b/scripts/autobuild-run
index 69766b2..6adfa99 100755
--- a/scripts/autobuild-run
+++ b/scripts/autobuild-run
@@ -440,6 +440,7 @@ class Builder:
"""
reproducible_results = os.path.join(self.outputdir, "results", "reproducible_results")
+ reproducible_results_text = os.path.join(self.outputdir, "results", "reproducible_results_text")
# Using only tar images for now
build_1_image = os.path.join(self.outputdir, "images", "rootfs.tar")
build_2_image = os.path.join(self.outputdir_2, "images", "rootfs.tar")
@@ -453,7 +454,9 @@ class Builder:
prefix = prefix[13:-1]
log_write(self.log, "INFO: running diffoscope on images")
subprocess.call(["diffoscope", build_1_image, build_2_image,
- "--tool-prefix-binutils", prefix], stdout=diff, stderr=self.log)
+ "--tool-prefix-binutils", prefix, "--json", "-",
+ "--text", reproducible_results_text, "--max-text-report-size", "40000"],
+ stdout=diff, stderr=self.log)
else:
log_write(self.log, "INFO: diffoscope not installed, falling back to cmp")
subprocess.call(["cmp", "-b", build_1_image, build_2_image], stdout=diff, stderr=self.log)
--
2.22.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [Buildroot] [PATCH v2 3/5] autobuild-run: make diffoscope output a JSON-formatted file as well
2019-08-10 3:59 ` [Buildroot] [PATCH v2 3/5] autobuild-run: make diffoscope output a JSON-formatted file as well Atharva Lele
@ 2019-08-11 13:41 ` Thomas Petazzoni
2019-08-12 9:22 ` Atharva Lele
2019-08-13 4:07 ` Atharva Lele
0 siblings, 2 replies; 11+ messages in thread
From: Thomas Petazzoni @ 2019-08-11 13:41 UTC (permalink / raw)
To: buildroot
Hello Atharva,
On Sat, 10 Aug 2019 09:29:00 +0530
Atharva Lele <itsatharva@gmail.com> wrote:
> reproducible_results = os.path.join(self.outputdir, "results", "reproducible_results")
I change the name of the JSON file to:
diffoscope-results.json
> + reproducible_results_text = os.path.join(self.outputdir, "results", "reproducible_results_text")
And the name of the text file to:
diffoscope-results.txt
These names seem more sensible to me, and match better what we're
already using for other files in the build results.
I've applied to buildroot-test with this change. I have another
suggestion below, though.
> # Using only tar images for now
> build_1_image = os.path.join(self.outputdir, "images", "rootfs.tar")
> build_2_image = os.path.join(self.outputdir_2, "images", "rootfs.tar")
> @@ -453,7 +454,9 @@ class Builder:
> prefix = prefix[13:-1]
> log_write(self.log, "INFO: running diffoscope on images")
> subprocess.call(["diffoscope", build_1_image, build_2_image,
> - "--tool-prefix-binutils", prefix], stdout=diff, stderr=self.log)
> + "--tool-prefix-binutils", prefix, "--json", "-",
Instead of outputting the result to stdout, and then redirecting the
stdout to "diff", what about using diffoscope ability to directly
output to a file:
"--json", reproducible_results, "--text", reproducible_results_text
Thanks!
Thomas
--
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Buildroot] [PATCH v2 3/5] autobuild-run: make diffoscope output a JSON-formatted file as well
2019-08-11 13:41 ` Thomas Petazzoni
@ 2019-08-12 9:22 ` Atharva Lele
2019-08-13 4:07 ` Atharva Lele
1 sibling, 0 replies; 11+ messages in thread
From: Atharva Lele @ 2019-08-12 9:22 UTC (permalink / raw)
To: buildroot
On Sun, Aug 11, 2019 at 7:11 PM Thomas Petazzoni
<thomas.petazzoni@bootlin.com> wrote:
>
> Hello Atharva,
>
> On Sat, 10 Aug 2019 09:29:00 +0530
> Atharva Lele <itsatharva@gmail.com> wrote:
>
> > reproducible_results = os.path.join(self.outputdir, "results", "reproducible_results")
>
> I change the name of the JSON file to:
>
> diffoscope-results.json
>
> > + reproducible_results_text = os.path.join(self.outputdir, "results", "reproducible_results_text")
>
> And the name of the text file to:
>
> diffoscope-results.txt
>
> These names seem more sensible to me, and match better what we're
> already using for other files in the build results.
>
Changing the names does make sense, thanks!
> I've applied to buildroot-test with this change. I have another
> suggestion below, though.
>
> > # Using only tar images for now
> > build_1_image = os.path.join(self.outputdir, "images", "rootfs.tar")
> > build_2_image = os.path.join(self.outputdir_2, "images", "rootfs.tar")
> > @@ -453,7 +454,9 @@ class Builder:
> > prefix = prefix[13:-1]
> > log_write(self.log, "INFO: running diffoscope on images")
> > subprocess.call(["diffoscope", build_1_image, build_2_image,
> > - "--tool-prefix-binutils", prefix], stdout=diff, stderr=self.log)
> > + "--tool-prefix-binutils", prefix, "--json", "-",
>
> Instead of outputting the result to stdout, and then redirecting the
> stdout to "diff", what about using diffoscope ability to directly
> output to a file:
>
> "--json", reproducible_results, "--text", reproducible_results_text
>
You're right. It'd be better to have it consistent. I'll prepare a
patch to change it.
> Thanks!
>
> Thomas
> --
> Thomas Petazzoni, CTO, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
Thanks for merging and thanks for the review!
--
Regards,
Atharva Lele
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Buildroot] [PATCH v2 3/5] autobuild-run: make diffoscope output a JSON-formatted file as well
2019-08-11 13:41 ` Thomas Petazzoni
2019-08-12 9:22 ` Atharva Lele
@ 2019-08-13 4:07 ` Atharva Lele
2019-08-13 12:29 ` Thomas Petazzoni
1 sibling, 1 reply; 11+ messages in thread
From: Atharva Lele @ 2019-08-13 4:07 UTC (permalink / raw)
To: buildroot
On Sun, Aug 11, 2019 at 7:11 PM Thomas Petazzoni
<thomas.petazzoni@bootlin.com> wrote:
>
> Hello Atharva,
>
> On Sat, 10 Aug 2019 09:29:00 +0530
> Atharva Lele <itsatharva@gmail.com> wrote:
>
> > reproducible_results = os.path.join(self.outputdir, "results", "reproducible_results")
>
> I change the name of the JSON file to:
>
> diffoscope-results.json
>
> > + reproducible_results_text = os.path.join(self.outputdir, "results", "reproducible_results_text")
>
There's a typo here. Variable is named reproducible_results_txt in the
patch, but in the diffoscope command I have used
reproducible_results_text. I'll send a patch fixing this and the
command consistency.
> And the name of the text file to:
>
> diffoscope-results.txt
>
> These names seem more sensible to me, and match better what we're
> already using for other files in the build results.
>
> I've applied to buildroot-test with this change. I have another
> suggestion below, though.
>
> > # Using only tar images for now
> > build_1_image = os.path.join(self.outputdir, "images", "rootfs.tar")
> > build_2_image = os.path.join(self.outputdir_2, "images", "rootfs.tar")
> > @@ -453,7 +454,9 @@ class Builder:
> > prefix = prefix[13:-1]
> > log_write(self.log, "INFO: running diffoscope on images")
> > subprocess.call(["diffoscope", build_1_image, build_2_image,
> > - "--tool-prefix-binutils", prefix], stdout=diff, stderr=self.log)
> > + "--tool-prefix-binutils", prefix, "--json", "-",
>
> Instead of outputting the result to stdout, and then redirecting the
> stdout to "diff", what about using diffoscope ability to directly
> output to a file:
>
> "--json", reproducible_results, "--text", reproducible_results_text
>
> Thanks!
>
> Thomas
> --
> Thomas Petazzoni, CTO, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
--
Regards,
Atharva Lele
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Buildroot] [PATCH v2 3/5] autobuild-run: make diffoscope output a JSON-formatted file as well
2019-08-13 4:07 ` Atharva Lele
@ 2019-08-13 12:29 ` Thomas Petazzoni
0 siblings, 0 replies; 11+ messages in thread
From: Thomas Petazzoni @ 2019-08-13 12:29 UTC (permalink / raw)
To: buildroot
Hello Atharva,
On Tue, 13 Aug 2019 09:37:27 +0530
Atharva Lele <itsatharva@gmail.com> wrote:
> > > + reproducible_results_text = os.path.join(self.outputdir, "results", "reproducible_results_text")
> >
>
> There's a typo here. Variable is named reproducible_results_txt in the
> patch, but in the diffoscope command I have used
> reproducible_results_text. I'll send a patch fixing this and the
> command consistency.
My bad. I initially started changing the variable names to something
more sensible, such as "diffoscope_results_json" and
"diffoscope_results_txt" and then changed my mind, as it should have
been a separate patch. But apparently, I undid my change incorrectly.
Sorry about this.
Thomas
--
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Buildroot] [PATCH v2 4/5] autobuild-run: initial implementation of get_reproducibility_failure_reason()
2019-08-10 3:58 [Buildroot] [PATCH v2 1/5] autobuild-run: use different output directories for reproducible builds testing Atharva Lele
2019-08-10 3:58 ` [Buildroot] [PATCH v2 2/5] autobuild-run: make prepare_build() clean the output directory used for reproducibility testing Atharva Lele
2019-08-10 3:59 ` [Buildroot] [PATCH v2 3/5] autobuild-run: make diffoscope output a JSON-formatted file as well Atharva Lele
@ 2019-08-10 3:59 ` Atharva Lele
2019-08-10 3:59 ` [Buildroot] [PATCH v2 5/5] autobuild-run: account for reproducibility failures in get_failure_reason() Atharva Lele
2019-08-11 13:39 ` [Buildroot] [PATCH v2 1/5] autobuild-run: use different output directories for reproducible builds testing Thomas Petazzoni
4 siblings, 0 replies; 11+ messages in thread
From: Atharva Lele @ 2019-08-10 3:59 UTC (permalink / raw)
To: buildroot
Analyze the JSON formatted output from diffoscope and check if
the differences are due to a filesystem reproducibility issue
or a package reproducibility issue.
Also, discard the deltas because they might take up too much space.
Signed-off-by: Atharva Lele <itsatharva@gmail.com>
---
Changes v1 -> v2:
- Refactor using subfunctions and local variables (suggested by Thomas)
- Added comments (suggested by Thomas)
- Use more pythonic loops (suggested by Thomas)
---
scripts/autobuild-run | 89 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 89 insertions(+)
diff --git a/scripts/autobuild-run b/scripts/autobuild-run
index 6adfa99..1be392a 100755
--- a/scripts/autobuild-run
+++ b/scripts/autobuild-run
@@ -131,6 +131,7 @@ import csv
import docopt
import errno
import hashlib
+import json
import mmap
import multiprocessing
import os
@@ -599,6 +600,94 @@ class Builder:
if reject_results():
return
+ def get_reproducibility_failure_reason(reproducible_results):
+ def split_delta(delta):
+ # Take a delta and split it into added, deleted lines.
+ added = []
+ deleted = []
+ for line in delta:
+ if line.startswith("+"):
+ added.append(line)
+ if line.startswith("-"):
+ deleted.append(line)
+ return added, deleted
+
+ def get_package(sourcef):
+ # Returns which package the source file belongs to.
+ with open(packages_file_list, "r") as packagef:
+ for line in packagef:
+ if sourcef in line:
+ package = line.split(',')[0]
+
+ if package:
+ # Get package version
+ package_info = json.loads(subprocess.check_output(["make", "--no-print-directory",
+ "O=%s" % self.outputdir,
+ "-C", self.srcdir,
+ "%s-show-info" % package]))
+ if "version" in package_info[package]:
+ version = package_info[package]["version"]
+ return [package, version]
+ else:
+ return [package]
+ else:
+ return ["not found"]
+
+ def cleanup(l):
+ # Takes a list and removes data which is redundant (source2) or data
+ # that might take up too much space (like huge diffs).
+ if "unified_diff" in l:
+ l.pop("unified_diff")
+ if "source2" in l:
+ l.pop("source2")
+
+
+ packages_file_list = os.path.join(self.outputdir, "build", "packages-file-list.txt")
+
+ with open(reproducible_results, "r") as reproduciblef:
+ json_data = json.load(reproduciblef)
+
+ if json_data["unified_diff"] == None:
+ # Remove the file list because it is not useful, i.e. it only shows
+ # which files vary, and nothing more.
+ if json_data["details"][0]["source1"] == "file list":
+ json_data["details"].pop(0)
+
+ # Iterate over details in the diffoscope output.
+ for item in json_data["details"]:
+ diff_src = item["source1"]
+ item["package"] = get_package(diff_src)
+
+ # In some cases, diffoscope uses multiple commands to get various
+ # diffs. Due to this, it generates a "details" key for those files
+ # instead of just storing the diff in the "unified_diff" key.
+ if item["unified_diff"] == None:
+ for item_details in item["details"]:
+ diff = item_details["unified_diff"].split("\n")
+ split_deltas = split_delta(diff)
+ item_details["added"] = split_deltas[0][:100]
+ item_details["deleted"] = split_deltas[1][:100]
+ cleanup(item_details)
+ else:
+ diff = item["unified_diff"].split("\n")
+ split_deltas = split_delta(diff)
+ item["added"] = split_deltas[0][:100]
+ item["deleted"] = split_deltas[1][:100]
+ cleanup(item)
+ # We currently just set the reason from first non-reproducible package in the
+ # dictionary.
+ reason = json_data["details"][0]["package"]
+
+ # If there does exist a unified_diff directly for the .tar images, it is probably
+ # a filesystem reproducibility issue.
+ else:
+ reason = ["filesystem"]
+
+ with open(reproducible_results, "w") as reproduciblef:
+ json.dump(json_data, reproduciblef, sort_keys=True, indent=4)
+
+ return reason
+
def get_failure_reason():
# Output is a tuple (package, version), or None.
lastlines = decode_bytes(subprocess.Popen(
--
2.22.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [Buildroot] [PATCH v2 5/5] autobuild-run: account for reproducibility failures in get_failure_reason()
2019-08-10 3:58 [Buildroot] [PATCH v2 1/5] autobuild-run: use different output directories for reproducible builds testing Atharva Lele
` (2 preceding siblings ...)
2019-08-10 3:59 ` [Buildroot] [PATCH v2 4/5] autobuild-run: initial implementation of get_reproducibility_failure_reason() Atharva Lele
@ 2019-08-10 3:59 ` Atharva Lele
2019-08-11 13:39 ` [Buildroot] [PATCH v2 1/5] autobuild-run: use different output directories for reproducible builds testing Thomas Petazzoni
4 siblings, 0 replies; 11+ messages in thread
From: Atharva Lele @ 2019-08-10 3:59 UTC (permalink / raw)
To: buildroot
Signed-off-by: Atharva Lele <itsatharva@gmail.com>
---
scripts/autobuild-run | 29 ++++++++++++++++++++---------
1 file changed, 20 insertions(+), 9 deletions(-)
diff --git a/scripts/autobuild-run b/scripts/autobuild-run
index 1be392a..7fc8d3a 100755
--- a/scripts/autobuild-run
+++ b/scripts/autobuild-run
@@ -690,15 +690,26 @@ class Builder:
def get_failure_reason():
# Output is a tuple (package, version), or None.
- lastlines = decode_bytes(subprocess.Popen(
- ["tail", "-n", "3", os.path.join(self.outputdir, "logfile")],
- stdout=subprocess.PIPE).communicate()[0]).splitlines()
-
- regexp = re.compile(r'make: \*\*\* .*/(?:build|toolchain)/([^/]*)/')
- for line in lastlines:
- m = regexp.search(line)
- if m:
- return m.group(1).rsplit('-', 1)
+ # Output is "package-reproducible" in case of reproducibility failure.
+
+ reproducible_results = os.path.join(self.resultdir, "reproducible_results")
+ if os.path.exists(reproducible_results) and os.stat(reproducible_results).st_size > 0:
+ if self.sysinfo.has("diffoscope"):
+ reason = get_reproducibility_failure_reason(reproducible_results)
+ reason.append("nonreproducible")
+ return reason
+ else:
+ return ["nonreproducible"]
+ else:
+ lastlines = decode_bytes(subprocess.Popen(
+ ["tail", "-n", "3", os.path.join(self.outputdir, "logfile")],
+ stdout=subprocess.PIPE).communicate()[0]).splitlines()
+
+ regexp = re.compile(r'make: \*\*\* .*/(?:build|toolchain)/([^/]*)/')
+ for line in lastlines:
+ m = regexp.search(line)
+ if m:
+ return m.group(1).rsplit('-', 1)
# not found
return None
--
2.22.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [Buildroot] [PATCH v2 1/5] autobuild-run: use different output directories for reproducible builds testing
2019-08-10 3:58 [Buildroot] [PATCH v2 1/5] autobuild-run: use different output directories for reproducible builds testing Atharva Lele
` (3 preceding siblings ...)
2019-08-10 3:59 ` [Buildroot] [PATCH v2 5/5] autobuild-run: account for reproducibility failures in get_failure_reason() Atharva Lele
@ 2019-08-11 13:39 ` Thomas Petazzoni
4 siblings, 0 replies; 11+ messages in thread
From: Thomas Petazzoni @ 2019-08-11 13:39 UTC (permalink / raw)
To: buildroot
On Sat, 10 Aug 2019 09:28:58 +0530
Atharva Lele <itsatharva@gmail.com> wrote:
> Having different output directories increases variation in the build.
>
> This commit also changes the directory name of the single output directory to
> "output-1", as having varied length names for output directories introduces a
> lot of reproducibility issues.
>
> Official Debian efforts also currently use same length names for different
> directories: "/build/1st/$pkg-$ver" and "/build/$pkg-$ver/2nd"
> Cfr: https://tests.reproducible-builds.org/debian/index_variations.html
>
> Reproducible-Builds has proposed a spec called BUILD_PATH_PREFIX_MAP to help
> with this. However, that is still in the works and not fully solved yet.
> (As told on the #reproducible-builds IRC).
>
> So I think until further work is done on BUILD_PATH_PREFIX_MAP, we will have
> to stay with output directories of same length.
>
> See: https://wiki.debian.org/ReproducibleBuilds/GCC-build-path and
> https://reproducible-builds.org/specs/build-path-prefix-map/
>
> Signed-off-by: Atharva Lele <itsatharva@gmail.com>
> ---
> Changes v1 -> v2:
> - Added more details in commit log (suggested by Thomas)
> ---
> scripts/autobuild-run | 22 +++++++++++++---------
> 1 file changed, 13 insertions(+), 9 deletions(-)
Applied to buildroot-test, thanks!
Thomas
--
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
^ permalink raw reply [flat|nested] 11+ messages in thread