All of lore.kernel.org
 help / color / mirror / Atom feed
From: Atharva Lele <itsatharva@gmail.com>
To: buildroot@busybox.net
Subject: [Buildroot] [PATCH v2 4/5] autobuild-run: initial implementation of get_reproducibility_failure_reason()
Date: Sat, 10 Aug 2019 09:29:01 +0530	[thread overview]
Message-ID: <20190810035902.14047-4-itsatharva@gmail.com> (raw)
In-Reply-To: <20190810035902.14047-1-itsatharva@gmail.com>

Analyze the JSON formatted output from diffoscope and check if
the differences are due to a filesystem reproducibility issue
or a package reproducibility issue.

Also, discard the deltas because they might take up too much space.

Signed-off-by: Atharva Lele <itsatharva@gmail.com>
---
Changes v1 -> v2:
  - Refactor using subfunctions and local variables (suggested by Thomas)
  - Added comments (suggested by Thomas)
  - Use more pythonic loops (suggested by Thomas)
---
 scripts/autobuild-run | 89 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/scripts/autobuild-run b/scripts/autobuild-run
index 6adfa99..1be392a 100755
--- a/scripts/autobuild-run
+++ b/scripts/autobuild-run
@@ -131,6 +131,7 @@ import csv
 import docopt
 import errno
 import hashlib
+import json
 import mmap
 import multiprocessing
 import os
@@ -599,6 +600,94 @@ class Builder:
         if reject_results():
             return
 
+        def get_reproducibility_failure_reason(reproducible_results):
+            def split_delta(delta):
+                # Take a delta and split it into added, deleted lines.
+                added = []
+                deleted = []
+                for line in delta:
+                    if line.startswith("+"):
+                        added.append(line)
+                    if line.startswith("-"):
+                        deleted.append(line)
+                return added, deleted
+
+            def get_package(sourcef):
+                # Returns which package the source file belongs to.
+                with open(packages_file_list, "r") as packagef:
+                    for line in packagef:
+                        if sourcef in line:
+                            package = line.split(',')[0]
+
+                if package:
+                    # Get package version
+                    package_info = json.loads(subprocess.check_output(["make", "--no-print-directory",
+                                                                       "O=%s" % self.outputdir,
+                                                                       "-C", self.srcdir,
+                                                                       "%s-show-info" % package]))
+                    if "version" in package_info[package]:
+                        version = package_info[package]["version"]
+                        return [package, version]
+                    else:
+                        return [package]
+                else:
+                    return ["not found"]
+
+            def cleanup(l):
+                # Takes a list and removes data which is redundant (source2) or data
+                # that might take up too much space (like huge diffs).
+                if "unified_diff" in l:
+                    l.pop("unified_diff")
+                if "source2" in l:
+                    l.pop("source2")
+
+
+            packages_file_list = os.path.join(self.outputdir, "build", "packages-file-list.txt")
+
+            with open(reproducible_results, "r") as reproduciblef:
+                json_data = json.load(reproduciblef)
+
+            if json_data["unified_diff"] == None:
+                # Remove the file list because it is not useful, i.e. it only shows
+                # which files vary, and nothing more.
+                if json_data["details"][0]["source1"] == "file list":
+                    json_data["details"].pop(0)
+
+                # Iterate over details in the diffoscope output.
+                for item in json_data["details"]:
+                    diff_src = item["source1"]
+                    item["package"] = get_package(diff_src)
+
+                    # In some cases, diffoscope uses multiple commands to get various
+                    # diffs. Due to this, it generates a "details" key for those files
+                    # instead of just storing the diff in the "unified_diff" key.
+                    if item["unified_diff"] == None:
+                        for item_details in item["details"]:
+                            diff = item_details["unified_diff"].split("\n")
+                            split_deltas = split_delta(diff)
+                            item_details["added"] = split_deltas[0][:100]
+                            item_details["deleted"] = split_deltas[1][:100]
+                            cleanup(item_details)
+                    else:
+                        diff = item["unified_diff"].split("\n")
+                        split_deltas = split_delta(diff)
+                        item["added"] = split_deltas[0][:100]
+                        item["deleted"] = split_deltas[1][:100]
+                    cleanup(item)
+                # We currently just set the reason from first non-reproducible package in the
+                # dictionary.
+                reason = json_data["details"][0]["package"]
+
+                # If there does exist a unified_diff directly for the .tar images, it is probably
+                # a filesystem reproducibility issue.
+            else:
+                reason = ["filesystem"]
+
+            with open(reproducible_results, "w") as reproduciblef:
+                json.dump(json_data, reproduciblef, sort_keys=True, indent=4)
+
+            return reason
+
         def get_failure_reason():
             # Output is a tuple (package, version), or None.
             lastlines = decode_bytes(subprocess.Popen(
-- 
2.22.0

  parent reply	other threads:[~2019-08-10  3:59 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-10  3:58 [Buildroot] [PATCH v2 1/5] autobuild-run: use different output directories for reproducible builds testing Atharva Lele
2019-08-10  3:58 ` [Buildroot] [PATCH v2 2/5] autobuild-run: make prepare_build() clean the output directory used for reproducibility testing Atharva Lele
2019-08-11 13:39   ` Thomas Petazzoni
2019-08-10  3:59 ` [Buildroot] [PATCH v2 3/5] autobuild-run: make diffoscope output a JSON-formatted file as well Atharva Lele
2019-08-11 13:41   ` Thomas Petazzoni
2019-08-12  9:22     ` Atharva Lele
2019-08-13  4:07     ` Atharva Lele
2019-08-13 12:29       ` Thomas Petazzoni
2019-08-10  3:59 ` Atharva Lele [this message]
2019-08-10  3:59 ` [Buildroot] [PATCH v2 5/5] autobuild-run: account for reproducibility failures in get_failure_reason() Atharva Lele
2019-08-11 13:39 ` [Buildroot] [PATCH v2 1/5] autobuild-run: use different output directories for reproducible builds testing Thomas Petazzoni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190810035902.14047-4-itsatharva@gmail.com \
    --to=itsatharva@gmail.com \
    --cc=buildroot@busybox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.