From: Atharva Lele <itsatharva@gmail.com>
To: buildroot@busybox.net
Subject: [Buildroot] [PATCH v4 2/5] autobuild-run: initial implementation of get_reproducibility_failure_reason()
Date: Tue, 20 Aug 2019 20:22:28 +0530 [thread overview]
Message-ID: <20190820145231.15507-2-itsatharva@gmail.com> (raw)
In-Reply-To: <20190820145231.15507-1-itsatharva@gmail.com>
Analyze the JSON formatted output from diffoscope and check if
the differences are due to a filesystem reproducibility issue
or a package reproducibility issue.
Also, discard the deltas because they might take up too much space.
Signed-off-by: Atharva Lele <itsatharva@gmail.com>
---
Changes v2 -> v4:
- Change if-else to try-except
- remove blank line
Changes v1 -> v2:
- Refactor using subfunctions and local variables (suggested by Thomas)
- Added comments (suggested by Thomas)
- Use more pythonic loops (suggested by Thomas)
---
scripts/autobuild-run | 88 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 88 insertions(+)
diff --git a/scripts/autobuild-run b/scripts/autobuild-run
index 99b57dd..384cf73 100755
--- a/scripts/autobuild-run
+++ b/scripts/autobuild-run
@@ -131,6 +131,7 @@ import csv
import docopt
import errno
import hashlib
+import json
import mmap
import multiprocessing
import os
@@ -599,6 +600,93 @@ class Builder:
if reject_results():
return
+ def get_reproducibility_failure_reason(reproducible_results):
+ def split_delta(delta):
+ # Take a delta and split it into added, deleted lines.
+ added = []
+ deleted = []
+ for line in delta:
+ if line.startswith("+"):
+ added.append(line)
+ if line.startswith("-"):
+ deleted.append(line)
+ return added, deleted
+
+ def get_package(sourcef):
+ # Returns which package the source file belongs to.
+ with open(packages_file_list, "r") as packagef:
+ for line in packagef:
+ if sourcef in line:
+ package = line.split(',')[0]
+
+ try:
+ # Get package version
+ package_info = json.loads(subprocess.check_output(["make", "--no-print-directory",
+ "O=%s" % self.outputdir,
+ "-C", self.srcdir,
+ "%s-show-info" % package]))
+ if "version" in package_info[package]:
+ version = package_info[package]["version"]
+ return [package, version]
+ else:
+ return [package]
+ except:
+ return ["not found"]
+
+ def cleanup(l):
+ # Takes a list and removes data which is redundant (source2) or data
+ # that might take up too much space (like huge diffs).
+ if "unified_diff" in l:
+ l.pop("unified_diff")
+ if "source2" in l:
+ l.pop("source2")
+
+ packages_file_list = os.path.join(self.outputdir, "build", "packages-file-list.txt")
+
+ with open(reproducible_results, "r") as reproduciblef:
+ json_data = json.load(reproduciblef)
+
+ if json_data["unified_diff"] == None:
+ # Remove the file list because it is not useful, i.e. it only shows
+ # which files vary, and nothing more.
+ if json_data["details"][0]["source1"] == "file list":
+ json_data["details"].pop(0)
+
+ # Iterate over details in the diffoscope output.
+ for item in json_data["details"]:
+ diff_src = item["source1"]
+ item["package"] = get_package(diff_src)
+
+ # In some cases, diffoscope uses multiple commands to get various
+ # diffs. Due to this, it generates a "details" key for those files
+ # instead of just storing the diff in the "unified_diff" key.
+ if item["unified_diff"] == None:
+ for item_details in item["details"]:
+ diff = item_details["unified_diff"].split("\n")
+ split_deltas = split_delta(diff)
+ item_details["added"] = split_deltas[0][:100]
+ item_details["deleted"] = split_deltas[1][:100]
+ cleanup(item_details)
+ else:
+ diff = item["unified_diff"].split("\n")
+ split_deltas = split_delta(diff)
+ item["added"] = split_deltas[0][:100]
+ item["deleted"] = split_deltas[1][:100]
+ cleanup(item)
+ # We currently just set the reason from first non-reproducible package in the
+ # dictionary.
+ reason = json_data["details"][0]["package"]
+
+ # If there does exist a unified_diff directly for the .tar images, it is probably
+ # a filesystem reproducibility issue.
+ else:
+ reason = ["filesystem"]
+
+ with open(reproducible_results, "w") as reproduciblef:
+ json.dump(json_data, reproduciblef, sort_keys=True, indent=4)
+
+ return reason
+
def get_failure_reason():
# Output is a tuple (package, version), or None.
lastlines = decode_bytes(subprocess.Popen(
--
2.22.0
next prev parent reply other threads:[~2019-08-20 14:52 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-20 14:52 [Buildroot] [PATCH v4 1/5] autobuild-run: check if reproducibile_results exists before checking its size Atharva Lele
2019-08-20 14:52 ` Atharva Lele [this message]
2019-09-08 17:06 ` [Buildroot] [PATCH v4 2/5] autobuild-run: initial implementation of get_reproducibility_failure_reason() Arnout Vandecappelle
2019-09-08 22:42 ` Thomas Petazzoni
2019-09-09 7:35 ` Arnout Vandecappelle
2019-09-09 7:45 ` Thomas Petazzoni
2019-09-12 12:47 ` Atharva Lele
2019-09-14 17:27 ` Arnout Vandecappelle
2019-08-20 14:52 ` [Buildroot] [PATCH v4 3/5] autobuild-run: account for reproducibility failures in get_failure_reason() Atharva Lele
2019-09-08 17:13 ` Arnout Vandecappelle
2019-09-12 12:59 ` Atharva Lele
2019-09-14 17:33 ` Arnout Vandecappelle
2019-08-20 14:52 ` [Buildroot] [PATCH v4 4/5] autobuild-run: move with open to appropriate place in check_reproducibility() Atharva Lele
2019-08-20 14:52 ` [Buildroot] [PATCH v4 5/5] autobuild-run: initial implementation of categorization() of nonreproducibility Atharva Lele
2019-09-08 16:43 ` [Buildroot] [PATCH v4 1/5] autobuild-run: check if reproducibile_results exists before checking its size Arnout Vandecappelle
2019-09-12 12:00 ` Atharva Lele
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190820145231.15507-2-itsatharva@gmail.com \
--to=itsatharva@gmail.com \
--cc=buildroot@busybox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.