Buildroot Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [Buildroot] [PATCH v3 1/3] autobuild-run: maintain consistency in diffoscope command
@ 2019-08-13  4:12 Atharva Lele
  2019-08-13  4:12 ` [Buildroot] [PATCH v3 2/3] autobuild-run: initial implementation of get_reproducibility_failure_reason() Atharva Lele
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Atharva Lele @ 2019-08-13  4:12 UTC (permalink / raw)
  To: buildroot

Since we are outputting to two file types, we can't just redirect stdout.
Instead, we use diffoscope's ability to output to a file to maintain
consistency in the command.

This patch also fixes a typo in the command. For text output,
it should be reproducibility_results_txt instead of
reproducibility_results_text.

Signed-off-by: Atharva Lele <itsatharva@gmail.com>
---
 scripts/autobuild-run | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/scripts/autobuild-run b/scripts/autobuild-run
index dbb497c..ead81a0 100755
--- a/scripts/autobuild-run
+++ b/scripts/autobuild-run
@@ -454,9 +454,9 @@ class Builder:
                 prefix = prefix[13:-1]
                 log_write(self.log, "INFO: running diffoscope on images")
                 subprocess.call(["diffoscope", build_1_image, build_2_image,
-                                 "--tool-prefix-binutils", prefix, "--json", "-",
-                                 "--text", reproducible_results_text, "--max-text-report-size", "40000"],
-                                stdout=diff, stderr=self.log)
+                                 "--tool-prefix-binutils", prefix, "--json", reproducible_results,
+                                 "--text", reproducible_results_txt, "--max-text-report-size", "40000"],
+                                stderr=self.log)
             else:
                 log_write(self.log, "INFO: diffoscope not installed, falling back to cmp")
                 subprocess.call(["cmp", "-b", build_1_image, build_2_image], stdout=diff, stderr=self.log)
-- 
2.22.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Buildroot] [PATCH v3 2/3] autobuild-run: initial implementation of get_reproducibility_failure_reason()
  2019-08-13  4:12 [Buildroot] [PATCH v3 1/3] autobuild-run: maintain consistency in diffoscope command Atharva Lele
@ 2019-08-13  4:12 ` Atharva Lele
  2019-08-13  4:12 ` [Buildroot] [PATCH v3 3/3] autobuild-run: account for reproducibility failures in get_failure_reason() Atharva Lele
  2019-08-13 14:19 ` [Buildroot] [PATCH v3 1/3] autobuild-run: maintain consistency in diffoscope command Thomas Petazzoni
  2 siblings, 0 replies; 5+ messages in thread
From: Atharva Lele @ 2019-08-13  4:12 UTC (permalink / raw)
  To: buildroot

Analyze the JSON formatted output from diffoscope and check if
the differences are due to a filesystem reproducibility issue
or a package reproducibility issue.

Also, discard the deltas because they might take up too much space.

Signed-off-by: Atharva Lele <itsatharva@gmail.com>
---
Changes v1 -> v2:
  - Refactor using subfunctions and local variables (suggested by Thomas)
  - Added comments (suggested by Thomas)
  - Use more pythonic loops (suggested by Thomas)
---
 scripts/autobuild-run | 89 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/scripts/autobuild-run b/scripts/autobuild-run
index ead81a0..9b8983f 100755
--- a/scripts/autobuild-run
+++ b/scripts/autobuild-run
@@ -131,6 +131,7 @@ import csv
 import docopt
 import errno
 import hashlib
+import json
 import mmap
 import multiprocessing
 import os
@@ -599,6 +600,94 @@ class Builder:
         if reject_results():
             return
 
+        def get_reproducibility_failure_reason(reproducible_results):
+            def split_delta(delta):
+                # Take a delta and split it into added, deleted lines.
+                added = []
+                deleted = []
+                for line in delta:
+                    if line.startswith("+"):
+                        added.append(line)
+                    if line.startswith("-"):
+                        deleted.append(line)
+                return added, deleted
+
+            def get_package(sourcef):
+                # Returns which package the source file belongs to.
+                with open(packages_file_list, "r") as packagef:
+                    for line in packagef:
+                        if sourcef in line:
+                            package = line.split(',')[0]
+
+                if package:
+                    # Get package version
+                    package_info = json.loads(subprocess.check_output(["make", "--no-print-directory",
+                                                                       "O=%s" % self.outputdir,
+                                                                       "-C", self.srcdir,
+                                                                       "%s-show-info" % package]))
+                    if "version" in package_info[package]:
+                        version = package_info[package]["version"]
+                        return [package, version]
+                    else:
+                        return [package]
+                else:
+                    return ["not found"]
+
+            def cleanup(l):
+                # Takes a list and removes data which is redundant (source2) or data
+                # that might take up too much space (like huge diffs).
+                if "unified_diff" in l:
+                    l.pop("unified_diff")
+                if "source2" in l:
+                    l.pop("source2")
+
+
+            packages_file_list = os.path.join(self.outputdir, "build", "packages-file-list.txt")
+
+            with open(reproducible_results, "r") as reproduciblef:
+                json_data = json.load(reproduciblef)
+
+            if json_data["unified_diff"] == None:
+                # Remove the file list because it is not useful, i.e. it only shows
+                # which files vary, and nothing more.
+                if json_data["details"][0]["source1"] == "file list":
+                    json_data["details"].pop(0)
+
+                # Iterate over details in the diffoscope output.
+                for item in json_data["details"]:
+                    diff_src = item["source1"]
+                    item["package"] = get_package(diff_src)
+
+                    # In some cases, diffoscope uses multiple commands to get various
+                    # diffs. Due to this, it generates a "details" key for those files
+                    # instead of just storing the diff in the "unified_diff" key.
+                    if item["unified_diff"] == None:
+                        for item_details in item["details"]:
+                            diff = item_details["unified_diff"].split("\n")
+                            split_deltas = split_delta(diff)
+                            item_details["added"] = split_deltas[0][:100]
+                            item_details["deleted"] = split_deltas[1][:100]
+                            cleanup(item_details)
+                    else:
+                        diff = item["unified_diff"].split("\n")
+                        split_deltas = split_delta(diff)
+                        item["added"] = split_deltas[0][:100]
+                        item["deleted"] = split_deltas[1][:100]
+                    cleanup(item)
+                # We currently just set the reason from first non-reproducible package in the
+                # dictionary.
+                reason = json_data["details"][0]["package"]
+
+                # If there does exist a unified_diff directly for the .tar images, it is probably
+                # a filesystem reproducibility issue.
+            else:
+                reason = ["filesystem"]
+
+            with open(reproducible_results, "w") as reproduciblef:
+                json.dump(json_data, reproduciblef, sort_keys=True, indent=4)
+
+            return reason
+
         def get_failure_reason():
             # Output is a tuple (package, version), or None.
             lastlines = decode_bytes(subprocess.Popen(
-- 
2.22.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Buildroot] [PATCH v3 3/3] autobuild-run: account for reproducibility failures in get_failure_reason()
  2019-08-13  4:12 [Buildroot] [PATCH v3 1/3] autobuild-run: maintain consistency in diffoscope command Atharva Lele
  2019-08-13  4:12 ` [Buildroot] [PATCH v3 2/3] autobuild-run: initial implementation of get_reproducibility_failure_reason() Atharva Lele
@ 2019-08-13  4:12 ` Atharva Lele
  2019-08-13 14:19 ` [Buildroot] [PATCH v3 1/3] autobuild-run: maintain consistency in diffoscope command Thomas Petazzoni
  2 siblings, 0 replies; 5+ messages in thread
From: Atharva Lele @ 2019-08-13  4:12 UTC (permalink / raw)
  To: buildroot

Signed-off-by: Atharva Lele <itsatharva@gmail.com>
---
Changes v1 -> v3:
  - Account for changed name of diffoscope output
---
 scripts/autobuild-run | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/scripts/autobuild-run b/scripts/autobuild-run
index 9b8983f..eba16b2 100755
--- a/scripts/autobuild-run
+++ b/scripts/autobuild-run
@@ -690,15 +690,26 @@ class Builder:
 
         def get_failure_reason():
             # Output is a tuple (package, version), or None.
-            lastlines = decode_bytes(subprocess.Popen(
-                ["tail", "-n", "3", os.path.join(self.outputdir, "logfile")],
-                stdout=subprocess.PIPE).communicate()[0]).splitlines()
-
-            regexp = re.compile(r'make: \*\*\* .*/(?:build|toolchain)/([^/]*)/')
-            for line in lastlines:
-                m = regexp.search(line)
-                if m:
-                    return m.group(1).rsplit('-', 1)
+            # Output is "package-reproducible" in case of reproducibility failure.
+
+            reproducible_results = os.path.join(self.resultdir, "diffoscope-results.json")
+            if os.path.exists(reproducible_results) and os.stat(reproducible_results).st_size > 0:
+                if self.sysinfo.has("diffoscope"):
+                    reason = get_reproducibility_failure_reason(reproducible_results)
+                    reason.append("nonreproducible")
+                    return reason
+                else:
+                    return ["nonreproducible"]
+            else:
+                lastlines = decode_bytes(subprocess.Popen(
+                    ["tail", "-n", "3", os.path.join(self.outputdir, "logfile")],
+                    stdout=subprocess.PIPE).communicate()[0]).splitlines()
+
+                regexp = re.compile(r'make: \*\*\* .*/(?:build|toolchain)/([^/]*)/')
+                for line in lastlines:
+                    m = regexp.search(line)
+                    if m:
+                        return m.group(1).rsplit('-', 1)
 
             # not found
             return None
-- 
2.22.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Buildroot] [PATCH v3 1/3] autobuild-run: maintain consistency in diffoscope command
  2019-08-13  4:12 [Buildroot] [PATCH v3 1/3] autobuild-run: maintain consistency in diffoscope command Atharva Lele
  2019-08-13  4:12 ` [Buildroot] [PATCH v3 2/3] autobuild-run: initial implementation of get_reproducibility_failure_reason() Atharva Lele
  2019-08-13  4:12 ` [Buildroot] [PATCH v3 3/3] autobuild-run: account for reproducibility failures in get_failure_reason() Atharva Lele
@ 2019-08-13 14:19 ` Thomas Petazzoni
  2019-08-16 13:56   ` Atharva Lele
  2 siblings, 1 reply; 5+ messages in thread
From: Thomas Petazzoni @ 2019-08-13 14:19 UTC (permalink / raw)
  To: buildroot

Hello Atharva,

On Tue, 13 Aug 2019 09:42:04 +0530
Atharva Lele <itsatharva@gmail.com> wrote:

> Since we are outputting to two file types, we can't just redirect stdout.
> Instead, we use diffoscope's ability to output to a file to maintain
> consistency in the command.
> 
> This patch also fixes a typo in the command. For text output,
> it should be reproducibility_results_txt instead of
> reproducibility_results_text.
> 
> Signed-off-by: Atharva Lele <itsatharva@gmail.com>
> ---
>  scripts/autobuild-run | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Applied to buildroot-test, thanks. However, the code running diffoscope
is still executed inside a with open(reproducible_results, 'w') as
diff, which is no longer needed, as diffoscope outputs directly to the
reproducible_results file.

You should move that this file opening inside the else condition which
is used when diffoscope is not available.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Buildroot] [PATCH v3 1/3] autobuild-run: maintain consistency in diffoscope command
  2019-08-13 14:19 ` [Buildroot] [PATCH v3 1/3] autobuild-run: maintain consistency in diffoscope command Thomas Petazzoni
@ 2019-08-16 13:56   ` Atharva Lele
  0 siblings, 0 replies; 5+ messages in thread
From: Atharva Lele @ 2019-08-16 13:56 UTC (permalink / raw)
  To: buildroot

On Tue, Aug 13, 2019 at 7:49 PM Thomas Petazzoni
<thomas.petazzoni@bootlin.com> wrote:
>
> Hello Atharva,
>
> On Tue, 13 Aug 2019 09:42:04 +0530
> Atharva Lele <itsatharva@gmail.com> wrote:
>
> > Since we are outputting to two file types, we can't just redirect stdout.
> > Instead, we use diffoscope's ability to output to a file to maintain
> > consistency in the command.
> >
> > This patch also fixes a typo in the command. For text output,
> > it should be reproducibility_results_txt instead of
> > reproducibility_results_text.
> >
> > Signed-off-by: Atharva Lele <itsatharva@gmail.com>
> > ---
> >  scripts/autobuild-run | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
>
> Applied to buildroot-test, thanks. However, the code running diffoscope
> is still executed inside a with open(reproducible_results, 'w') as
> diff, which is no longer needed, as diffoscope outputs directly to the
> reproducible_results file.
>
> You should move that this file opening inside the else condition which
> is used when diffoscope is not available.
>

Oh right, slipped my mind. I'll send a patch to fix it!

Apologies for the late reply, but your email got buried between
message delivery failures since emails to Yann are bouncing.

> Best regards,
>
> Thomas
> --
> Thomas Petazzoni, CTO, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com

Thank you for the review!

-- 
Regards,
Atharva Lele

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-08-16 13:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-08-13  4:12 [Buildroot] [PATCH v3 1/3] autobuild-run: maintain consistency in diffoscope command Atharva Lele
2019-08-13  4:12 ` [Buildroot] [PATCH v3 2/3] autobuild-run: initial implementation of get_reproducibility_failure_reason() Atharva Lele
2019-08-13  4:12 ` [Buildroot] [PATCH v3 3/3] autobuild-run: account for reproducibility failures in get_failure_reason() Atharva Lele
2019-08-13 14:19 ` [Buildroot] [PATCH v3 1/3] autobuild-run: maintain consistency in diffoscope command Thomas Petazzoni
2019-08-16 13:56   ` Atharva Lele

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox