* [PATCH i-g-t v10] igt-runner fact checking
2024-11-02 11:37 [PATCH i-g-t] " Peter Senna Tschudin
@ 2024-12-05 4:54 ` Peter Senna Tschudin
2024-12-05 9:08 ` Piatkowski, Dominik Karol
2024-12-06 11:42 ` Kamil Konieczny
0 siblings, 2 replies; 13+ messages in thread
From: Peter Senna Tschudin @ 2024-12-05 4:54 UTC (permalink / raw)
To: igt-dev@lists.freedesktop.org
Cc: Ryszard Knop, Janusz Krzysztofik, Zbigniew Kempczyński,
Lucas De Marchi, luciano.coelho, nirmoy.das, stuart.summers,
himal.prasad.ghimiray, dominik.karol.piatkowski,
katarzyna.piecielska
When using igt-runner, collect facts before each test and after the
last test, and report when facts change. The facts are:
- GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20)
- Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1
- Kernel taints: kernel.is_tainted.taint_warn: true
- GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true
This change imposes little execution overhead and adds just a few
lines of logging. The facts will be printed on normal igt-runner
output. Here is a real example from our CI showing
hotreplug-lateclose changing the DRM card number and tainting the
kernel on the abort path:
[245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose)
[245.383596] Starting subtest: hotreplug-lateclose
[249.843361] Aborting: Lockdep not active
[249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1
[249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true
[249.859075] Closing watchdogs
CC: Ryszard Knop <ryszard.knop@intel.com>
CC: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
CC: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
CC: Lucas De Marchi <lucas.demarchi@intel.com>
CC: luciano.coelho@intel.com
CC: nirmoy.das@intel.com
CC: stuart.summers@intel.com
CC: himal.prasad.ghimiray@intel.com
CC: dominik.karol.piatkowski@intel.com
CC: katarzyna.piecielska@intel.com
Signed-off-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
---
Re-sending as the mailing list server rejected the message
yesterday due to a probably full disk...
v10:
- fix memory leaks from asprintf (Thank you Dominik Karol!)
- fix comments for consistency (Thank you Dominik Karol!)
v9:
- do not report new hardware when loading/unloading kmod changes
the string of the GPU name. I accidentally reintroduced this
issue when refactoring to use linked lists.
- add tools/lsfacts: 9 lines of code that print either the facts
or that no facts were found.
- fix code comments describing functions
- fix white space issues
v8:
- fix white space issues
v7:
- refactor to use linked lists provided by igt_lists
- Added function arguments to code comments
- updated commit message
v6:
- sort includes in igt_facts.c alphabetically
- add facts for kernel taints using igt_kernel_tainted() and
igt_explain_taints()
v5:
- fix the broken patch format from v4
v4:
- fix a bug on delete_fact()
- drop glib and calls to g_ functions
- change commit message to indicate that report only on fact changes
- use consistent format for reporting changes
- fix SPDX header format
v3:
- refreshed commit message
- changed format SPDX string
- removed license text
- replace last_test assignment when null by two ternary operators
- added function descriptions following example found elsewhere in
the code
- added igt_assert to catch failures to realloc()
v2:
- add lib/tests/igt_facts.c for basic unit testing
- bugfix: do not report a new gpu when the driver changes the gpu name
- bugfix: do not report the pci_id twice on the gpu name
lib/igt_facts.c | 755 ++++++++++++++++++++++++++++++++++++++++++
lib/igt_facts.h | 47 +++
lib/meson.build | 1 +
lib/tests/igt_facts.c | 15 +
lib/tests/meson.build | 1 +
runner/executor.c | 10 +
tools/lsfacts.c | 25 ++
tools/meson.build | 1 +
8 files changed, 855 insertions(+)
create mode 100644 lib/igt_facts.c
create mode 100644 lib/igt_facts.h
create mode 100644 lib/tests/igt_facts.c
create mode 100644 tools/lsfacts.c
diff --git a/lib/igt_facts.c b/lib/igt_facts.c
new file mode 100644
index 000000000..4749d3417
--- /dev/null
+++ b/lib/igt_facts.c
@@ -0,0 +1,755 @@
+// SPDX-License-Identifier: MIT
+// Copyright © 2024 Intel Corporation
+
+#include <ctype.h>
+#include <libudev.h>
+#include <stdio.h>
+#include <sys/time.h>
+#include <time.h>
+
+#include "igt_core.h"
+#include "igt_device_scan.h"
+#include "igt_facts.h"
+#include "igt_kmod.h"
+#include "igt_list.h"
+#include "igt_taints.h"
+
+static struct igt_list_head igt_facts_list_drm_card_head;
+static struct igt_list_head igt_facts_list_kmod_head;
+static struct igt_list_head igt_facts_list_ktaint_head;
+static struct igt_list_head igt_facts_list_pci_gpu_head;
+
+
+/**
+ * igt_facts_lists_init:
+ *
+ * Initialize igt_facts linked lists.
+ *
+ * Returns: void
+ */
+void igt_facts_lists_init(void)
+{
+ IGT_INIT_LIST_HEAD(&igt_facts_list_drm_card_head);
+ IGT_INIT_LIST_HEAD(&igt_facts_list_kmod_head);
+ IGT_INIT_LIST_HEAD(&igt_facts_list_ktaint_head);
+ IGT_INIT_LIST_HEAD(&igt_facts_list_pci_gpu_head);
+}
+
+
+/**
+ * igt_facts_log:
+ * @last_test: name of the test that triggered the fact
+ * @name: name of the fact
+ * @new_value: new value of the fact
+ * @old_value: old value of the fact
+ *
+ * Reports fact changes:
+ * - new fact: if old_value is NULL and new_value is not NULL
+ * - deleted fact: if new_value is NULL and old_value is not NULL
+ * - changed fact: if new_value is different from old_value
+ *
+ * Returns: void
+ */
+static void igt_facts_log(const char *last_test, const char *name,
+ const char *new_value, const char *old_value)
+{
+ struct timespec uptime_ts;
+ char *uptime = NULL;
+ const char *before_tests = "before any test";
+
+ if (old_value == NULL && new_value == NULL)
+ return;
+
+ if (clock_gettime(CLOCK_BOOTTIME, &uptime_ts) != 0)
+ return;
+
+ asprintf(&uptime,
+ "%ld.%06ld",
+ uptime_ts.tv_sec,
+ uptime_ts.tv_nsec / 1000);
+
+ /* New fact */
+ if (old_value == NULL && new_value != NULL) {
+ igt_info("[%s] [FACT %s] new: %s: %s\n",
+ uptime,
+ last_test ? last_test : before_tests,
+ name,
+ new_value);
+ goto out;
+ }
+
+ /* Update fact */
+ if (old_value != NULL && new_value != NULL) {
+ igt_info("[%s] [FACT %s] changed: %s: %s -> %s\n",
+ uptime,
+ last_test ? last_test : before_tests,
+ name,
+ old_value,
+ new_value);
+ goto out;
+ }
+
+ /* Deleted fact */
+ if (old_value != NULL && new_value == NULL) {
+ igt_info("[%s] [FACT %s] deleted: %s: %s\n",
+ uptime,
+ last_test ? last_test : before_tests,
+ name,
+ old_value);
+ goto out;
+ }
+
+out:
+ free(uptime);
+}
+
+/**
+ * igt_facts_list_get:
+ * @name: name of the fact to be added
+ * @head: head of the list
+ *
+ * Get a fact from the list.
+ *
+ * Returns: pointer to the fact if found, NULL otherwise
+ *
+ */
+static igt_fact *igt_facts_list_get(const char *name,
+ struct igt_list_head *head)
+{
+ igt_fact *fact = NULL;
+
+ if (igt_list_empty(head))
+ return NULL;
+
+ igt_list_for_each_entry(fact, head, link) {
+ if (strcmp(fact->name, name) == 0)
+ return fact;
+ }
+ return NULL;
+}
+
+/**
+ * igt_facts_list_del:
+ * @name: name of the fact to be added
+ * @head: head of the list
+ * @last_test: name of the last test
+ * @log: bool indicating if the delete operation should be logged
+ *
+ * Delete a fact from the list.
+ *
+ * Returns: bool indicating if fact was deleted from the list
+ *
+ */
+static bool igt_facts_list_del(const char *name,
+ struct igt_list_head *head,
+ const char *last_test,
+ bool log)
+{
+ igt_fact *fact = NULL;
+
+ if (igt_list_empty(head))
+ return false;
+
+ igt_list_for_each_entry(fact, head, link) {
+ if (strcmp(fact->name, name) == 0) {
+ if (log)
+ igt_facts_log(last_test, fact->name,
+ NULL, fact->value);
+
+ igt_list_del(&fact->link);
+ free(fact->name);
+ free(fact->value);
+ free(fact->last_test);
+ free(fact);
+ return true;
+ }
+ }
+ return false;
+}
+
+/**
+ * igt_facts_list_add:
+ * @name: name of the fact to be added
+ * @value: value of the fact to be added
+ * @last_test: name of the last test
+ * @head: head of the list
+ *
+ * Returns: bool indicating if fact was added to the list
+ *
+ */
+static bool igt_facts_list_add(const char *name,
+ const char *value,
+ const char *last_test,
+ struct igt_list_head *head)
+{
+ igt_fact *new_fact = NULL, *old_fact = NULL;
+ bool logged = false;
+
+ if (name == NULL || value == NULL)
+ return false;
+
+ old_fact = igt_facts_list_get(name, head);
+ if (old_fact) {
+ if (strcmp(old_fact->value, value) == 0) {
+ old_fact->present = true;
+ return false;
+ }
+ igt_facts_log(last_test, name, value, old_fact->value);
+ logged = true;
+ igt_facts_list_del(name, head, last_test, false);
+ }
+
+ new_fact = malloc(sizeof(igt_fact));
+ if (new_fact == NULL)
+ return false;
+
+ new_fact->name = strdup(name);
+ new_fact->value = strdup(value);
+ new_fact->last_test = last_test ? strdup(last_test) : NULL;
+ new_fact->present = true;
+
+ if (!logged)
+ igt_facts_log(last_test, name, value, NULL);
+
+ igt_list_add(&new_fact->link, head);
+
+ return true;
+}
+
+/**
+ * igt_facts_list_mark:
+ * @head: head of the list
+ *
+ * Mark all facts in the list as not present. Opted for the mark and sweep
+ * design pattern due to its simplicity and efficiency.
+ *
+ * Returns: void
+ */
+static void igt_facts_list_mark(struct igt_list_head *head)
+{
+ igt_fact *fact = NULL;
+
+ if (igt_list_empty(head))
+ return;
+
+ igt_list_for_each_entry(fact, head, link)
+ fact->present = false;
+}
+
+/**
+ * igt_facts_list_sweep:
+ * @head: head of the list
+ * @last_test: name of the last test
+ *
+ * Sweep the list and delete all facts that are not present. Opted for the mark
+ * and sweep design pattern due to its simplicity and efficiency.
+ *
+ * Returns: void
+ */
+static void igt_facts_list_sweep(struct igt_list_head *head,
+ const char *last_test)
+{
+ igt_fact *fact = NULL, *tmp = NULL;
+
+ if (igt_list_empty(head))
+ return;
+
+ igt_list_for_each_entry_safe(fact, tmp, head, link)
+ if (!fact->present)
+ igt_facts_list_del(fact->name, head, last_test, true);
+}
+
+/**
+ * igt_facts_list_mark_and_sweep:
+ * @head: head of the list
+ *
+ * Clean up the list using mark and sweep. Opted for the mark and sweep
+ * design pattern due to its simplicity and efficiency.
+ *
+ * Returns: void
+ */
+static void igt_facts_list_mark_and_sweep(struct igt_list_head *head)
+{
+ igt_facts_list_mark(head);
+ igt_facts_list_sweep(head, NULL);
+}
+
+/**
+ * igt_facts_are_all_lists_empty:
+ *
+ * Returns true if all lists are empty. Used by the tool lsfacts.
+ *
+ * Returns: bool
+ */
+bool igt_facts_are_all_lists_empty(void)
+{
+ return igt_list_empty(&igt_facts_list_drm_card_head) &&
+ igt_list_empty(&igt_facts_list_kmod_head) &&
+ igt_list_empty(&igt_facts_list_ktaint_head) &&
+ igt_list_empty(&igt_facts_list_pci_gpu_head);
+}
+
+/**
+ * igt_facts_scan_pci_gpus:
+ * @last_test: name of the last test
+ *
+ * This function scans the pci bus for gpus using udev. It uses
+ * igt_facts_list_mark(), igt_facts_list_add() and igt_facts_list_sweep() to
+ * update igt_facts_list_pci_gpu_head.
+ *
+ * Returns: void
+ */
+static void igt_facts_scan_pci_gpus(const char *last_test)
+{
+ static struct igt_list_head *head = &igt_facts_list_pci_gpu_head;
+ struct udev *udev = NULL;
+ struct udev_enumerate *enumerate = NULL;
+ struct udev_list_entry *devices, *dev_list_entry;
+ struct igt_device_card card;
+ char pcistr[10];
+ int ret;
+ char *factname = NULL;
+ char *factvalue = NULL;
+
+ udev = udev_new();
+ if (!udev) {
+ igt_warn("Failed to create udev context\n");
+ return;
+ }
+
+ enumerate = udev_enumerate_new(udev);
+ if (!enumerate) {
+ igt_warn("Failed to create udev enumerate\n");
+ udev_unref(udev);
+ return;
+ }
+
+ ret = udev_enumerate_add_match_subsystem(enumerate, "pci");
+ if (ret < 0)
+ goto out;
+
+ ret = udev_enumerate_add_match_property(enumerate,
+ "PCI_CLASS",
+ "30000");
+ if (ret < 0)
+ goto out;
+
+ ret = udev_enumerate_add_match_property(enumerate,
+ "PCI_CLASS",
+ "38000");
+ if (ret < 0)
+ goto out;
+
+ ret = udev_enumerate_scan_devices(enumerate);
+ if (ret < 0)
+ goto out;
+
+ devices = udev_enumerate_get_list_entry(enumerate);
+ if (!devices)
+ goto out;
+
+ igt_facts_list_mark(head);
+
+ udev_list_entry_foreach(dev_list_entry, devices) {
+ const char *path;
+ struct udev_device *udev_dev;
+ struct udev_list_entry *entry;
+ char *model = NULL;
+ char *codename = NULL;
+ igt_fact *old_fact = NULL;
+
+ path = udev_list_entry_get_name(dev_list_entry);
+ udev_dev = udev_device_new_from_syspath(udev, path);
+ if (!udev_dev)
+ continue;
+
+ /* Strip path to only the content after the last / */
+ path = strrchr(path, '/');
+ if (path)
+ path++;
+ else
+ path = "unknown";
+
+ strcpy(card.pci_slot_name, "-");
+
+ entry = udev_device_get_properties_list_entry(udev_dev);
+ while (entry) {
+ const char *name = udev_list_entry_get_name(entry);
+ const char *value = udev_list_entry_get_value(entry);
+
+ entry = udev_list_entry_get_next(entry);
+ if (!strcmp(name, "ID_MODEL_FROM_DATABASE"))
+ model = strdup(value);
+ else if (!strcmp(name, "PCI_ID"))
+ igt_assert_eq(sscanf(value, "%hx:%hx",
+ &card.pci_vendor,
+ &card.pci_device), 2);
+ }
+ snprintf(pcistr, sizeof(pcistr), "%04x:%04x",
+ card.pci_vendor, card.pci_device);
+ codename = igt_device_get_pretty_name(&card, false);
+
+ /* Set codename to null if it is the same string as pci_id */
+ if (codename && strcmp(pcistr, codename) == 0) {
+ free(codename);
+ codename = NULL;
+ }
+ asprintf(&factname, "%s.%s", pci_gpu_fact, path);
+ asprintf(&factvalue,
+ "%s %s %s",
+ pcistr,
+ codename ? codename : "",
+ model ? model : "");
+
+ /**
+ * Loading and unloading the kmod may change the human
+ * readeable string in value. Do not change value if the
+ * pci id is the same.
+ */
+ old_fact = igt_facts_list_get(factname, head);
+ if (old_fact && strncmp(old_fact->value, factvalue, 9) == 0)
+ old_fact->present = true;
+ else
+ igt_facts_list_add(factname, factvalue, last_test, head);
+
+ free(codename);
+ free(model);
+ free(factname);
+ free(factvalue);
+ udev_device_unref(udev_dev);
+ }
+
+ igt_facts_list_sweep(head, last_test);
+
+out:
+ udev_enumerate_unref(enumerate);
+ udev_unref(udev);
+}
+
+/**
+ * igt_facts_scan_pci_drm_cards:
+ * @last_test: name of the last test
+ *
+ * This function scans the pci bus for drm cards using udev. It uses the
+ * igt_facts_list_mark(), igt_facts_list_add() and igt_facts_list_sweep() to
+ * update igt_facts_list_drm_card_head.
+ *
+ * Returns: void
+ */
+static void igt_facts_scan_pci_drm_cards(const char *last_test)
+{
+ static struct igt_list_head *head = &igt_facts_list_drm_card_head;
+ struct udev *udev = NULL;
+ struct udev_enumerate *enumerate = NULL;
+ struct udev_list_entry *devices, *dev_list_entry;
+ int ret;
+ char *factname = NULL;
+ char *factvalue = NULL;
+
+ udev = udev_new();
+ if (!udev)
+ return;
+
+ enumerate = udev_enumerate_new(udev);
+ if (!enumerate) {
+ udev_unref(udev);
+ return;
+ }
+
+ ret = udev_enumerate_add_match_subsystem(enumerate, "drm");
+ if (ret < 0)
+ goto out;
+
+ ret = udev_enumerate_scan_devices(enumerate);
+ if (ret < 0)
+ goto out;
+
+ devices = udev_enumerate_get_list_entry(enumerate);
+ if (!devices)
+ goto out;
+
+ ret = udev_enumerate_add_match_subsystem(enumerate, "drm");
+ if (ret < 0)
+ goto out;
+
+ ret = udev_enumerate_scan_devices(enumerate);
+ if (ret < 0)
+ goto out;
+
+ devices = udev_enumerate_get_list_entry(enumerate);
+ if (!devices)
+ goto out;
+
+ igt_facts_list_mark(head);
+
+ udev_list_entry_foreach(dev_list_entry, devices) {
+ const char *path;
+ struct udev_device *drm_dev, *pci_dev;
+ const char *drm_name, *pci_addr;
+
+ path = udev_list_entry_get_name(dev_list_entry);
+ drm_dev = udev_device_new_from_syspath(udev, path);
+ if (!drm_dev)
+ continue;
+
+ drm_name = udev_device_get_sysname(drm_dev);
+ /* Filter the device by name. Want devices such as card0 and card1.
+ * If the device has '-' in the name, contine
+ */
+ if (strncmp(drm_name, "card", 4) != 0 ||
+ strchr(drm_name, '-') != NULL) {
+ udev_device_unref(drm_dev);
+ continue;
+ }
+
+ /* Get the pci address of the gpu associated with the drm_dev*/
+ pci_dev = udev_device_get_parent_with_subsystem_devtype(drm_dev,
+ "pci",
+ NULL);
+ if (pci_dev) {
+ pci_addr = udev_device_get_sysattr_value(pci_dev,
+ "address");
+ if (!pci_addr)
+ pci_addr = udev_device_get_sysname(pci_dev);
+ } else {
+ /* Some GPUs are platform devices. Ignore them. */
+ pci_addr = NULL;
+ udev_device_unref(drm_dev);
+ continue;
+ }
+
+ asprintf(&factname, "%s.%s", drm_card_fact, pci_addr);
+ asprintf(&factvalue, "%s", drm_name);
+
+ igt_facts_list_add(factname, factvalue, last_test, head);
+
+ free(factname);
+ free(factvalue);
+ udev_device_unref(drm_dev);
+ }
+
+ igt_facts_list_sweep(head, last_test);
+
+out:
+ udev_enumerate_unref(enumerate);
+ udev_unref(udev);
+}
+
+/**
+ * igt_facts_scan_kernel_taints:
+ * @last_test: name of the last test
+ *
+ * This function scans for kernel taints using igt_kernel_tainted() and
+ * igt_explain_taints(). It will cut off the explanation keeping only the
+ * taint name.
+ *
+ * Returns: void
+ */
+static void igt_facts_scan_kernel_taints(const char *last_test)
+{
+ static struct igt_list_head *head = &igt_facts_list_ktaint_head;
+ unsigned long taints = 0;
+ const char *reason = NULL;
+ char *taint_name = NULL;
+ char *fact_name = NULL;
+
+ taints = igt_kernel_tainted(&taints);
+ /* For testing, set all bits to 1
+ * taints = 0xFFFFFFFF;
+ */
+
+
+ igt_facts_list_mark(head);
+
+ while ((reason = igt_explain_taints(&taints)) != NULL) {
+ /* Cut at the ':' to get only the taint name */
+ taint_name = strtok(strdup(reason), ":");
+ if (!taint_name)
+ continue;
+
+ /* Lowercase taint_name */
+ for (int i = 0; taint_name[i]; i++)
+ taint_name[i] = tolower(taint_name[i]);
+
+ asprintf(&fact_name, "%s.%s", ktaint_fact, taint_name);
+ igt_facts_list_add(fact_name, "true", last_test, head);
+
+ free(taint_name);
+ free(fact_name);
+ }
+
+ igt_facts_list_sweep(head, last_test);
+}
+
+
+/**
+ * igt_facts_scan_kernel_loaded_kmods:
+ * @last_test: name of the last test
+ *
+ * This function scans for loaded kmods using igt_fact_kmod_list and
+ * igt_kmod_is_loaded().
+ *
+ * Returns: void
+ */
+static void igt_facts_scan_kernel_loaded_kmods(const char *last_test)
+{
+ static struct igt_list_head *head = &igt_facts_list_kmod_head;
+ char *name = NULL;
+
+ igt_facts_list_mark(head);
+
+ /* Iterate over igt_fact_kmod_list[] until the element contains "\0" */
+ for (int i = 0; strcmp(igt_fact_kmod_list[i], "\0") != 0; i++) {
+ asprintf(&name, "%s.%s", kmod_fact, igt_fact_kmod_list[i]);
+ if (igt_kmod_is_loaded(igt_fact_kmod_list[i]))
+ igt_facts_list_add(name, "true", last_test, head);
+
+ free(name);
+ }
+
+ igt_facts_list_sweep(head, last_test);
+}
+
+/**
+ * igt_facts:
+ * @last_test: name of the last test
+ *
+ * Call this function where you want to gather and report facts.
+ *
+ * Returns: void
+ */
+void igt_facts(const char *last_test)
+{
+ igt_facts_scan_pci_gpus(last_test);
+ igt_facts_scan_pci_drm_cards(last_test);
+ igt_facts_scan_kernel_taints(last_test);
+ igt_facts_scan_kernel_loaded_kmods(last_test);
+
+ fflush(stdout);
+ fflush(stderr);
+}
+
+/*
+ * Testing
+ *
+ * Defined here to keep most of the functions static
+ *
+ */
+
+/**
+ * igt_facts_test_add_get:
+ * @head: head of the list
+ *
+ * Tests igt_facts_list_add and igt_facts_list_get.
+ *
+ * Returns: void
+ */
+static void igt_facts_test_add_get(struct igt_list_head *head)
+{
+ igt_fact *fact = NULL;
+ bool ret;
+ const char *name = "hardware.pci.gpu_at_addr.0000:00:02.0";
+ const char *value = "8086:64a0 Intel Lunarlake (Gen20)";
+ const char *last_test = NULL;
+
+ ret = igt_facts_list_add(name, value, last_test, head);
+ igt_assert(ret == true);
+
+ // Assert that there is one element in the linked list
+ igt_assert_eq(igt_list_length(head), 1);
+
+ // Assert that the element in the linked list is the one we added
+ fact = igt_facts_list_get(name, head);
+ igt_assert(fact != NULL);
+ igt_assert_eq(strcmp(fact->name, name), 0);
+ igt_assert_eq(strcmp(fact->value, value), 0);
+ igt_assert(fact->present == true);
+ igt_assert(fact->last_test == NULL);
+}
+
+/**
+ * igt_facts_test_mark_and_sweep:
+ * @head: head of the list
+ *
+ * - Add 3 elements to the list and mark them as not present.
+ * - Update two of the elements and mark them as present.
+ * - Sweep the list and assert that
+ * - Only the two updated elements are present
+ * - The third element was deleted
+ *
+ * Returns: void
+ */
+static void igt_facts_test_mark_and_sweep(struct igt_list_head *head)
+{
+ igt_fact *fact = NULL;
+ const char *name1 = "hardware.pci.gpu_at_addr.0000:00:02.0";
+ const char *value1 = "8086:64a0 Intel Lunarlake (Gen20)";
+ const char *name2 = "hardware.pci.gpu_at_addr.0000:00:03.0";
+ const char *value2 = "8086:64a1 Intel Lunarlake (Gen21)";
+ const char *name3 = "hardware.pci.gpu_at_addr.0000:00:04.0";
+ const char *value3 = "8086:64a2 Intel Lunarlake (Gen22)";
+
+ igt_facts_list_add(name1, value1, NULL, head);
+ igt_facts_list_add(name2, value2, NULL, head);
+ igt_facts_list_add(name3, value3, NULL, head);
+
+ igt_facts_list_mark(head);
+
+ igt_facts_list_add(name1, value1, NULL, head);
+ igt_facts_list_add(name2, value2, NULL, head);
+
+ igt_facts_list_sweep(head, NULL);
+
+ // Assert that there are two elements in the linked list
+ igt_assert_eq(igt_list_length(head), 2);
+
+ // Assert that the two updated elements are present
+ fact = igt_facts_list_get(name1, head);
+ igt_assert(fact != NULL);
+ igt_assert(fact->present == true);
+
+ fact = igt_facts_list_get(name2, head);
+ igt_assert(fact != NULL);
+ igt_assert(fact->present == true);
+
+ // Assert that the third element was deleted
+ fact = igt_facts_list_get(name3, head);
+ igt_assert(fact == NULL);
+}
+
+/**
+ * igt_facts_test:
+ *
+ * Main function for testing the igt_facts module
+ *
+ * Returns: bool indicating if the tests passed
+ */
+void igt_facts_test(void)
+{
+ const char *last_test = "Unit Testing";
+
+ igt_facts_lists_init();
+
+ /* Assert that all lists are empty */
+ igt_assert(igt_list_empty(&igt_facts_list_kmod_head));
+ igt_assert(igt_list_empty(&igt_facts_list_ktaint_head));
+ igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head));
+ igt_assert(igt_list_empty(&igt_facts_list_drm_card_head));
+
+ /* Assert that add and get work. Will add one element to the list */
+ igt_facts_test_add_get(&igt_facts_list_pci_gpu_head);
+
+ /* Assert that igt_facts_list_mark_and_sweep() cleans up the list */
+ igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == false);
+ igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head);
+ igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == true);
+
+ /* Test the mark and sweep pattern used to delete elements
+ * from the list
+ */
+ igt_facts_test_mark_and_sweep(&igt_facts_list_pci_gpu_head);
+
+ /* Clean up the list and call igt_facts(). This should not crash */
+ igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head);
+ igt_facts(last_test);
+}
diff --git a/lib/igt_facts.h b/lib/igt_facts.h
new file mode 100644
index 000000000..e4adca3fb
--- /dev/null
+++ b/lib/igt_facts.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright © 2024 Intel Corporation
+ */
+
+#include <stdbool.h>
+
+#include "igt_list.h"
+
+
+/* igt_fact:
+ * @name: name of the fact
+ * @value: value of the fact
+ * @last_test: name of the test that triggered the fact
+ * @present: bool indicating if fact is present. Used for deleting facts from
+ * the list.
+ * @link: link to the next fact
+ *
+ * A fact is a piece of information that can be used to determine the state of
+ * the system.
+ *
+ */
+typedef struct {
+ char *name;
+ char *value;
+ char *last_test;
+ bool present; /* For mark and seep */
+ struct igt_list_head link;
+} igt_fact;
+
+const char *igt_fact_kmod_list[] = {
+ "amdgpu",
+ "i915",
+ "nouveau",
+ "radeon",
+ "xe",
+ "\0"
+};
+
+const char *kmod_fact = "kernel.kmod_is_loaded"; /* true or false */
+const char *ktaint_fact = "kernel.is_tainted"; /* taint name: taint_warn */
+const char *pci_gpu_fact = "hardware.pci.gpu_at_addr"; /* id vendor model */
+const char *drm_card_fact = "hardware.pci.drm_card_at_addr"; /* cardX */
+
+void igt_facts_lists_init(void);
+void igt_facts(const char *last_test);
+bool igt_facts_are_all_lists_empty(void);
+void igt_facts_test(void); /* For unit testing only */
diff --git a/lib/meson.build b/lib/meson.build
index c3556a921..c44ca2b5a 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -18,6 +18,7 @@ lib_sources = [
'i915/i915_crc.c',
'igt_collection.c',
'igt_color_encoding.c',
+ 'igt_facts.c',
'igt_crc.c',
'igt_debugfs.c',
'igt_device.c',
diff --git a/lib/tests/igt_facts.c b/lib/tests/igt_facts.c
new file mode 100644
index 000000000..7fa9d0f22
--- /dev/null
+++ b/lib/tests/igt_facts.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: MIT
+// Copyright © 2024 Intel Corporation
+
+#include <stdbool.h>
+
+#include "igt_core.h"
+#include "igt_facts.h"
+
+/* Tests are not defined here so we can keep most of the functions static */
+
+igt_simple_main
+{
+ igt_info("Running igt_facts_test\n");
+ igt_facts_test();
+}
diff --git a/lib/tests/meson.build b/lib/tests/meson.build
index df8092638..1ce19f63c 100644
--- a/lib/tests/meson.build
+++ b/lib/tests/meson.build
@@ -8,6 +8,7 @@ lib_tests = [
'igt_dynamic_subtests',
'igt_edid',
'igt_exit_handler',
+ 'igt_facts',
'igt_fork',
'igt_fork_helper',
'igt_hook',
diff --git a/runner/executor.c b/runner/executor.c
index ac73e1dde..d1eca3c05 100644
--- a/runner/executor.c
+++ b/runner/executor.c
@@ -30,6 +30,7 @@
#include "igt_aux.h"
#include "igt_core.h"
+#include "igt_facts.h"
#include "igt_taints.h"
#include "igt_vec.h"
#include "executor.h"
@@ -2306,6 +2307,9 @@ bool execute(struct execute_state *state,
sigset_t sigmask;
double time_spent = 0.0;
bool status = true;
+ char *last_test = NULL;
+
+ igt_facts_lists_init();
if (state->dry) {
outf("Dry run, not executing. Invoke igt_resume if you want to execute.\n");
@@ -2438,6 +2442,10 @@ bool execute(struct execute_state *state,
int result;
bool already_written = false;
+ /* Calls before running each test */
+ igt_facts(last_test);
+ last_test = entry_display_name(&job_list->entries[state->next]);
+
if (should_die_because_signal(sigfd)) {
status = false;
goto end;
@@ -2526,6 +2534,8 @@ bool execute(struct execute_state *state,
return execute(state, settings, job_list);
}
}
+ /* Last call to collect facts after the last test runs */
+ igt_facts(last_test);
if ((timefd = openat(resdirfd, "endtime.txt", O_CREAT | O_WRONLY | O_EXCL, 0666)) >= 0) {
dprintf(timefd, "%f\n", timeofday_double());
diff --git a/tools/lsfacts.c b/tools/lsfacts.c
new file mode 100644
index 000000000..10dee0317
--- /dev/null
+++ b/tools/lsfacts.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: MIT
+// Copyright © 2024 Intel Corporation
+
+#include "igt.h"
+#include "igt_facts.h"
+
+/**
+ * SECTION:lsfacts
+ * @short_description: lsfacts
+ * @title: lsfacts
+ * @include: lsfacts.c
+ *
+ * # lsfacts
+ *
+ * Scan for igt-facts and print them on screen. Indicate if no facts are found.
+ */
+int main(int argc, char *argv[])
+{
+ igt_facts_lists_init();
+
+ igt_facts("lsfacts");
+
+ if (igt_facts_are_all_lists_empty())
+ igt_info("No facts found...\n");
+}
diff --git a/tools/meson.build b/tools/meson.build
index 48c9a4b50..ff1b0ef90 100644
--- a/tools/meson.build
+++ b/tools/meson.build
@@ -42,6 +42,7 @@ tools_progs = [
'intel_gem_info',
'intel_gvtg_test',
'dpcd_reg',
+ 'lsfacts',
'lsgpu',
'power',
]
--
2.34.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* RE: [PATCH i-g-t v10] igt-runner fact checking
2024-12-05 4:54 ` [PATCH i-g-t v10] " Peter Senna Tschudin
@ 2024-12-05 9:08 ` Piatkowski, Dominik Karol
2024-12-06 11:42 ` Kamil Konieczny
1 sibling, 0 replies; 13+ messages in thread
From: Piatkowski, Dominik Karol @ 2024-12-05 9:08 UTC (permalink / raw)
To: Peter Senna Tschudin, igt-dev@lists.freedesktop.org
Cc: Knop, Ryszard, Janusz Krzysztofik, Kempczynski, Zbigniew,
De Marchi, Lucas, Coelho, Luciano, Das, Nirmoy, Summers, Stuart,
Ghimiray, Himal Prasad, Piecielska, Katarzyna
Hi Peter,
> -----Original Message-----
> From: Peter Senna Tschudin <peter.senna@linux.intel.com>
> Sent: Thursday, December 5, 2024 5:54 AM
> To: igt-dev@lists.freedesktop.org
> Cc: Knop, Ryszard <ryszard.knop@intel.com>; Janusz Krzysztofik
> <janusz.krzysztofik@linux.intel.com>; Kempczynski, Zbigniew
> <zbigniew.kempczynski@intel.com>; De Marchi, Lucas
> <lucas.demarchi@intel.com>; Coelho, Luciano <luciano.coelho@intel.com>;
> Das, Nirmoy <nirmoy.das@intel.com>; Summers, Stuart
> <stuart.summers@intel.com>; Ghimiray, Himal Prasad
> <himal.prasad.ghimiray@intel.com>; Piatkowski, Dominik Karol
> <dominik.karol.piatkowski@intel.com>; Piecielska, Katarzyna
> <katarzyna.piecielska@intel.com>
> Subject: [PATCH i-g-t v10] igt-runner fact checking
>
> When using igt-runner, collect facts before each test and after the last test, and
> report when facts change. The facts are:
> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel
> Battlemage (Gen20)
> - Associations between PCI GPU and DRM card:
> hardware.pci.drm_card_at_addr.0000:03:00.0: card1
> - Kernel taints: kernel.is_tainted.taint_warn: true
> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true
>
> This change imposes little execution overhead and adds just a few lines of
> logging. The facts will be printed on normal igt-runner output. Here is a real
> example from our CI showing hotreplug-lateclose changing the DRM card
> number and tainting the kernel on the abort path:
>
> [245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose)
> [245.383596] Starting subtest: hotreplug-lateclose [249.843361] Aborting:
> Lockdep not active [249.858249] [FACT core_hotunplug (hotreplug-
> lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 ->
> card1 [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new:
> kernel.is_tainted.taint_die: true [249.859075] Closing watchdogs
>
> CC: Ryszard Knop <ryszard.knop@intel.com>
> CC: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> CC: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> CC: Lucas De Marchi <lucas.demarchi@intel.com>
> CC: luciano.coelho@intel.com
> CC: nirmoy.das@intel.com
> CC: stuart.summers@intel.com
> CC: himal.prasad.ghimiray@intel.com
> CC: dominik.karol.piatkowski@intel.com
> CC: katarzyna.piecielska@intel.com
> Signed-off-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
> ---
> Re-sending as the mailing list server rejected the message yesterday due to a
> probably full disk...
>
> v10:
> - fix memory leaks from asprintf (Thank you Dominik Karol!)
> - fix comments for consistency (Thank you Dominik Karol!)
>
> v9:
> - do not report new hardware when loading/unloading kmod changes
> the string of the GPU name. I accidentally reintroduced this
> issue when refactoring to use linked lists.
> - add tools/lsfacts: 9 lines of code that print either the facts
> or that no facts were found.
> - fix code comments describing functions
> - fix white space issues
>
> v8:
> - fix white space issues
>
> v7:
> - refactor to use linked lists provided by igt_lists
> - Added function arguments to code comments
> - updated commit message
>
> v6:
> - sort includes in igt_facts.c alphabetically
> - add facts for kernel taints using igt_kernel_tainted() and
> igt_explain_taints()
>
> v5:
> - fix the broken patch format from v4
>
> v4:
> - fix a bug on delete_fact()
> - drop glib and calls to g_ functions
> - change commit message to indicate that report only on fact changes
> - use consistent format for reporting changes
> - fix SPDX header format
>
> v3:
> - refreshed commit message
> - changed format SPDX string
> - removed license text
> - replace last_test assignment when null by two ternary operators
> - added function descriptions following example found elsewhere in
> the code
> - added igt_assert to catch failures to realloc()
>
> v2:
> - add lib/tests/igt_facts.c for basic unit testing
> - bugfix: do not report a new gpu when the driver changes the gpu name
> - bugfix: do not report the pci_id twice on the gpu name
>
> lib/igt_facts.c | 755
> ++++++++++++++++++++++++++++++++++++++++++
> lib/igt_facts.h | 47 +++
> lib/meson.build | 1 +
> lib/tests/igt_facts.c | 15 +
> lib/tests/meson.build | 1 +
> runner/executor.c | 10 +
> tools/lsfacts.c | 25 ++
> tools/meson.build | 1 +
> 8 files changed, 855 insertions(+)
> create mode 100644 lib/igt_facts.c
> create mode 100644 lib/igt_facts.h
> create mode 100644 lib/tests/igt_facts.c create mode 100644 tools/lsfacts.c
>
> diff --git a/lib/igt_facts.c b/lib/igt_facts.c new file mode 100644 index
> 000000000..4749d3417
> --- /dev/null
> +++ b/lib/igt_facts.c
> @@ -0,0 +1,755 @@
> +// SPDX-License-Identifier: MIT
> +// Copyright C 2024 Intel Corporation
> +
> +#include <ctype.h>
> +#include <libudev.h>
> +#include <stdio.h>
> +#include <sys/time.h>
> +#include <time.h>
> +
> +#include "igt_core.h"
> +#include "igt_device_scan.h"
> +#include "igt_facts.h"
> +#include "igt_kmod.h"
> +#include "igt_list.h"
> +#include "igt_taints.h"
> +
> +static struct igt_list_head igt_facts_list_drm_card_head; static struct
> +igt_list_head igt_facts_list_kmod_head; static struct igt_list_head
> +igt_facts_list_ktaint_head; static struct igt_list_head
> +igt_facts_list_pci_gpu_head;
> +
> +
> +/**
> + * igt_facts_lists_init:
> + *
> + * Initialize igt_facts linked lists.
> + *
> + * Returns: void
> + */
> +void igt_facts_lists_init(void)
> +{
> + IGT_INIT_LIST_HEAD(&igt_facts_list_drm_card_head);
> + IGT_INIT_LIST_HEAD(&igt_facts_list_kmod_head);
> + IGT_INIT_LIST_HEAD(&igt_facts_list_ktaint_head);
> + IGT_INIT_LIST_HEAD(&igt_facts_list_pci_gpu_head);
> +}
> +
> +
> +/**
> + * igt_facts_log:
> + * @last_test: name of the test that triggered the fact
> + * @name: name of the fact
> + * @new_value: new value of the fact
> + * @old_value: old value of the fact
> + *
> + * Reports fact changes:
> + * - new fact: if old_value is NULL and new_value is not NULL
> + * - deleted fact: if new_value is NULL and old_value is not NULL
> + * - changed fact: if new_value is different from old_value
> + *
> + * Returns: void
> + */
> +static void igt_facts_log(const char *last_test, const char *name,
> + const char *new_value, const char *old_value) {
> + struct timespec uptime_ts;
> + char *uptime = NULL;
> + const char *before_tests = "before any test";
> +
> + if (old_value == NULL && new_value == NULL)
> + return;
> +
> + if (clock_gettime(CLOCK_BOOTTIME, &uptime_ts) != 0)
> + return;
> +
> + asprintf(&uptime,
> + "%ld.%06ld",
> + uptime_ts.tv_sec,
> + uptime_ts.tv_nsec / 1000);
> +
> + /* New fact */
> + if (old_value == NULL && new_value != NULL) {
> + igt_info("[%s] [FACT %s] new: %s: %s\n",
> + uptime,
> + last_test ? last_test : before_tests,
> + name,
> + new_value);
> + goto out;
> + }
> +
> + /* Update fact */
> + if (old_value != NULL && new_value != NULL) {
> + igt_info("[%s] [FACT %s] changed: %s: %s -> %s\n",
> + uptime,
> + last_test ? last_test : before_tests,
> + name,
> + old_value,
> + new_value);
> + goto out;
> + }
> +
> + /* Deleted fact */
> + if (old_value != NULL && new_value == NULL) {
> + igt_info("[%s] [FACT %s] deleted: %s: %s\n",
> + uptime,
> + last_test ? last_test : before_tests,
> + name,
> + old_value);
> + goto out;
> + }
> +
> +out:
> + free(uptime);
> +}
> +
> +/**
> + * igt_facts_list_get:
> + * @name: name of the fact to be added
> + * @head: head of the list
> + *
> + * Get a fact from the list.
> + *
> + * Returns: pointer to the fact if found, NULL otherwise
> + *
> + */
> +static igt_fact *igt_facts_list_get(const char *name,
> + struct igt_list_head *head)
> +{
> + igt_fact *fact = NULL;
> +
> + if (igt_list_empty(head))
> + return NULL;
> +
> + igt_list_for_each_entry(fact, head, link) {
> + if (strcmp(fact->name, name) == 0)
> + return fact;
> + }
> + return NULL;
> +}
> +
> +/**
> + * igt_facts_list_del:
> + * @name: name of the fact to be added
> + * @head: head of the list
> + * @last_test: name of the last test
> + * @log: bool indicating if the delete operation should be logged
> + *
> + * Delete a fact from the list.
> + *
> + * Returns: bool indicating if fact was deleted from the list
> + *
> + */
> +static bool igt_facts_list_del(const char *name,
> + struct igt_list_head *head,
> + const char *last_test,
> + bool log)
> +{
> + igt_fact *fact = NULL;
> +
> + if (igt_list_empty(head))
> + return false;
> +
> + igt_list_for_each_entry(fact, head, link) {
> + if (strcmp(fact->name, name) == 0) {
> + if (log)
> + igt_facts_log(last_test, fact->name,
> + NULL, fact->value);
> +
> + igt_list_del(&fact->link);
> + free(fact->name);
> + free(fact->value);
> + free(fact->last_test);
> + free(fact);
> + return true;
> + }
> + }
> + return false;
> +}
> +
> +/**
> + * igt_facts_list_add:
> + * @name: name of the fact to be added
> + * @value: value of the fact to be added
> + * @last_test: name of the last test
> + * @head: head of the list
> + *
> + * Returns: bool indicating if fact was added to the list
> + *
> + */
> +static bool igt_facts_list_add(const char *name,
> + const char *value,
> + const char *last_test,
> + struct igt_list_head *head)
> +{
> + igt_fact *new_fact = NULL, *old_fact = NULL;
> + bool logged = false;
> +
> + if (name == NULL || value == NULL)
> + return false;
> +
> + old_fact = igt_facts_list_get(name, head);
> + if (old_fact) {
> + if (strcmp(old_fact->value, value) == 0) {
> + old_fact->present = true;
> + return false;
> + }
> + igt_facts_log(last_test, name, value, old_fact->value);
> + logged = true;
> + igt_facts_list_del(name, head, last_test, false);
> + }
> +
> + new_fact = malloc(sizeof(igt_fact));
> + if (new_fact == NULL)
> + return false;
> +
> + new_fact->name = strdup(name);
> + new_fact->value = strdup(value);
> + new_fact->last_test = last_test ? strdup(last_test) : NULL;
> + new_fact->present = true;
> +
> + if (!logged)
> + igt_facts_log(last_test, name, value, NULL);
> +
> + igt_list_add(&new_fact->link, head);
> +
> + return true;
> +}
> +
> +/**
> + * igt_facts_list_mark:
> + * @head: head of the list
> + *
> + * Mark all facts in the list as not present. Opted for the mark and
> +sweep
> + * design pattern due to its simplicity and efficiency.
> + *
> + * Returns: void
> + */
> +static void igt_facts_list_mark(struct igt_list_head *head) {
> + igt_fact *fact = NULL;
> +
> + if (igt_list_empty(head))
> + return;
> +
> + igt_list_for_each_entry(fact, head, link)
> + fact->present = false;
> +}
> +
> +/**
> + * igt_facts_list_sweep:
> + * @head: head of the list
> + * @last_test: name of the last test
> + *
> + * Sweep the list and delete all facts that are not present. Opted for
> +the mark
> + * and sweep design pattern due to its simplicity and efficiency.
> + *
> + * Returns: void
> + */
> +static void igt_facts_list_sweep(struct igt_list_head *head,
> + const char *last_test)
> +{
> + igt_fact *fact = NULL, *tmp = NULL;
> +
> + if (igt_list_empty(head))
> + return;
> +
> + igt_list_for_each_entry_safe(fact, tmp, head, link)
> + if (!fact->present)
> + igt_facts_list_del(fact->name, head, last_test, true); }
> +
> +/**
> + * igt_facts_list_mark_and_sweep:
> + * @head: head of the list
> + *
> + * Clean up the list using mark and sweep. Opted for the mark and sweep
> + * design pattern due to its simplicity and efficiency.
> + *
> + * Returns: void
> + */
> +static void igt_facts_list_mark_and_sweep(struct igt_list_head *head) {
> + igt_facts_list_mark(head);
> + igt_facts_list_sweep(head, NULL);
> +}
> +
> +/**
> + * igt_facts_are_all_lists_empty:
> + *
> + * Returns true if all lists are empty. Used by the tool lsfacts.
> + *
> + * Returns: bool
> + */
> +bool igt_facts_are_all_lists_empty(void)
> +{
> + return igt_list_empty(&igt_facts_list_drm_card_head) &&
> + igt_list_empty(&igt_facts_list_kmod_head) &&
> + igt_list_empty(&igt_facts_list_ktaint_head) &&
> + igt_list_empty(&igt_facts_list_pci_gpu_head);
> +}
> +
> +/**
> + * igt_facts_scan_pci_gpus:
> + * @last_test: name of the last test
> + *
> + * This function scans the pci bus for gpus using udev. It uses
> + * igt_facts_list_mark(), igt_facts_list_add() and
> +igt_facts_list_sweep() to
> + * update igt_facts_list_pci_gpu_head.
> + *
> + * Returns: void
> + */
> +static void igt_facts_scan_pci_gpus(const char *last_test) {
> + static struct igt_list_head *head = &igt_facts_list_pci_gpu_head;
> + struct udev *udev = NULL;
> + struct udev_enumerate *enumerate = NULL;
> + struct udev_list_entry *devices, *dev_list_entry;
> + struct igt_device_card card;
> + char pcistr[10];
> + int ret;
> + char *factname = NULL;
> + char *factvalue = NULL;
> +
> + udev = udev_new();
> + if (!udev) {
> + igt_warn("Failed to create udev context\n");
> + return;
> + }
> +
> + enumerate = udev_enumerate_new(udev);
> + if (!enumerate) {
> + igt_warn("Failed to create udev enumerate\n");
> + udev_unref(udev);
> + return;
> + }
> +
> + ret = udev_enumerate_add_match_subsystem(enumerate, "pci");
> + if (ret < 0)
> + goto out;
> +
> + ret = udev_enumerate_add_match_property(enumerate,
> + "PCI_CLASS",
> + "30000");
> + if (ret < 0)
> + goto out;
> +
> + ret = udev_enumerate_add_match_property(enumerate,
> + "PCI_CLASS",
> + "38000");
> + if (ret < 0)
> + goto out;
> +
> + ret = udev_enumerate_scan_devices(enumerate);
> + if (ret < 0)
> + goto out;
> +
> + devices = udev_enumerate_get_list_entry(enumerate);
> + if (!devices)
> + goto out;
> +
> + igt_facts_list_mark(head);
> +
> + udev_list_entry_foreach(dev_list_entry, devices) {
> + const char *path;
> + struct udev_device *udev_dev;
> + struct udev_list_entry *entry;
> + char *model = NULL;
> + char *codename = NULL;
> + igt_fact *old_fact = NULL;
> +
> + path = udev_list_entry_get_name(dev_list_entry);
> + udev_dev = udev_device_new_from_syspath(udev, path);
> + if (!udev_dev)
> + continue;
> +
> + /* Strip path to only the content after the last / */
> + path = strrchr(path, '/');
> + if (path)
> + path++;
> + else
> + path = "unknown";
> +
> + strcpy(card.pci_slot_name, "-");
> +
> + entry = udev_device_get_properties_list_entry(udev_dev);
> + while (entry) {
> + const char *name = udev_list_entry_get_name(entry);
> + const char *value = udev_list_entry_get_value(entry);
> +
> + entry = udev_list_entry_get_next(entry);
> + if (!strcmp(name, "ID_MODEL_FROM_DATABASE"))
> + model = strdup(value);
> + else if (!strcmp(name, "PCI_ID"))
> + igt_assert_eq(sscanf(value, "%hx:%hx",
> + &card.pci_vendor,
> + &card.pci_device), 2);
> + }
> + snprintf(pcistr, sizeof(pcistr), "%04x:%04x",
> + card.pci_vendor, card.pci_device);
> + codename = igt_device_get_pretty_name(&card, false);
> +
> + /* Set codename to null if it is the same string as pci_id */
> + if (codename && strcmp(pcistr, codename) == 0) {
> + free(codename);
> + codename = NULL;
> + }
> + asprintf(&factname, "%s.%s", pci_gpu_fact, path);
> + asprintf(&factvalue,
> + "%s %s %s",
> + pcistr,
> + codename ? codename : "",
> + model ? model : "");
> +
> + /**
> + * Loading and unloading the kmod may change the human
> + * readeable string in value. Do not change value if the
> + * pci id is the same.
> + */
> + old_fact = igt_facts_list_get(factname, head);
> + if (old_fact && strncmp(old_fact->value, factvalue, 9) == 0)
> + old_fact->present = true;
> + else
> + igt_facts_list_add(factname, factvalue, last_test,
> head);
> +
> + free(codename);
> + free(model);
> + free(factname);
> + free(factvalue);
> + udev_device_unref(udev_dev);
> + }
> +
> + igt_facts_list_sweep(head, last_test);
> +
> +out:
> + udev_enumerate_unref(enumerate);
> + udev_unref(udev);
> +}
> +
> +/**
> + * igt_facts_scan_pci_drm_cards:
> + * @last_test: name of the last test
> + *
> + * This function scans the pci bus for drm cards using udev. It uses
> +the
> + * igt_facts_list_mark(), igt_facts_list_add() and
> +igt_facts_list_sweep() to
> + * update igt_facts_list_drm_card_head.
> + *
> + * Returns: void
> + */
> +static void igt_facts_scan_pci_drm_cards(const char *last_test) {
> + static struct igt_list_head *head = &igt_facts_list_drm_card_head;
> + struct udev *udev = NULL;
> + struct udev_enumerate *enumerate = NULL;
> + struct udev_list_entry *devices, *dev_list_entry;
> + int ret;
> + char *factname = NULL;
> + char *factvalue = NULL;
> +
> + udev = udev_new();
> + if (!udev)
> + return;
> +
> + enumerate = udev_enumerate_new(udev);
> + if (!enumerate) {
> + udev_unref(udev);
> + return;
> + }
> +
> + ret = udev_enumerate_add_match_subsystem(enumerate, "drm");
> + if (ret < 0)
> + goto out;
> +
> + ret = udev_enumerate_scan_devices(enumerate);
> + if (ret < 0)
> + goto out;
> +
> + devices = udev_enumerate_get_list_entry(enumerate);
> + if (!devices)
> + goto out;
> +
> + ret = udev_enumerate_add_match_subsystem(enumerate, "drm");
> + if (ret < 0)
> + goto out;
> +
> + ret = udev_enumerate_scan_devices(enumerate);
> + if (ret < 0)
> + goto out;
> +
> + devices = udev_enumerate_get_list_entry(enumerate);
> + if (!devices)
> + goto out;
> +
> + igt_facts_list_mark(head);
> +
> + udev_list_entry_foreach(dev_list_entry, devices) {
> + const char *path;
> + struct udev_device *drm_dev, *pci_dev;
> + const char *drm_name, *pci_addr;
> +
> + path = udev_list_entry_get_name(dev_list_entry);
> + drm_dev = udev_device_new_from_syspath(udev, path);
> + if (!drm_dev)
> + continue;
> +
> + drm_name = udev_device_get_sysname(drm_dev);
> + /* Filter the device by name. Want devices such as card0 and
> card1.
> + * If the device has '-' in the name, contine
> + */
> + if (strncmp(drm_name, "card", 4) != 0 ||
> + strchr(drm_name, '-') != NULL) {
> + udev_device_unref(drm_dev);
> + continue;
> + }
> +
> + /* Get the pci address of the gpu associated with the
> drm_dev*/
> + pci_dev =
> udev_device_get_parent_with_subsystem_devtype(drm_dev,
> + "pci",
> +
> NULL);
> + if (pci_dev) {
> + pci_addr = udev_device_get_sysattr_value(pci_dev,
> + "address");
> + if (!pci_addr)
> + pci_addr =
> udev_device_get_sysname(pci_dev);
> + } else {
> + /* Some GPUs are platform devices. Ignore them. */
> + pci_addr = NULL;
> + udev_device_unref(drm_dev);
> + continue;
> + }
> +
> + asprintf(&factname, "%s.%s", drm_card_fact, pci_addr);
> + asprintf(&factvalue, "%s", drm_name);
> +
> + igt_facts_list_add(factname, factvalue, last_test, head);
> +
> + free(factname);
> + free(factvalue);
> + udev_device_unref(drm_dev);
> + }
> +
> + igt_facts_list_sweep(head, last_test);
> +
> +out:
> + udev_enumerate_unref(enumerate);
> + udev_unref(udev);
> +}
> +
> +/**
> + * igt_facts_scan_kernel_taints:
> + * @last_test: name of the last test
> + *
> + * This function scans for kernel taints using igt_kernel_tainted() and
> + * igt_explain_taints(). It will cut off the explanation keeping only
> +the
> + * taint name.
> + *
> + * Returns: void
> + */
> +static void igt_facts_scan_kernel_taints(const char *last_test) {
> + static struct igt_list_head *head = &igt_facts_list_ktaint_head;
> + unsigned long taints = 0;
> + const char *reason = NULL;
> + char *taint_name = NULL;
> + char *fact_name = NULL;
> +
> + taints = igt_kernel_tainted(&taints);
> + /* For testing, set all bits to 1
> + * taints = 0xFFFFFFFF;
> + */
> +
> +
> + igt_facts_list_mark(head);
> +
> + while ((reason = igt_explain_taints(&taints)) != NULL) {
> + /* Cut at the ':' to get only the taint name */
> + taint_name = strtok(strdup(reason), ":");
> + if (!taint_name)
> + continue;
> +
> + /* Lowercase taint_name */
> + for (int i = 0; taint_name[i]; i++)
> + taint_name[i] = tolower(taint_name[i]);
> +
> + asprintf(&fact_name, "%s.%s", ktaint_fact, taint_name);
> + igt_facts_list_add(fact_name, "true", last_test, head);
> +
> + free(taint_name);
> + free(fact_name);
> + }
> +
> + igt_facts_list_sweep(head, last_test); }
> +
> +
> +/**
> + * igt_facts_scan_kernel_loaded_kmods:
> + * @last_test: name of the last test
> + *
> + * This function scans for loaded kmods using igt_fact_kmod_list and
> + * igt_kmod_is_loaded().
> + *
> + * Returns: void
> + */
> +static void igt_facts_scan_kernel_loaded_kmods(const char *last_test) {
> + static struct igt_list_head *head = &igt_facts_list_kmod_head;
> + char *name = NULL;
> +
> + igt_facts_list_mark(head);
> +
> + /* Iterate over igt_fact_kmod_list[] until the element contains "\0" */
> + for (int i = 0; strcmp(igt_fact_kmod_list[i], "\0") != 0; i++) {
> + asprintf(&name, "%s.%s", kmod_fact, igt_fact_kmod_list[i]);
> + if (igt_kmod_is_loaded(igt_fact_kmod_list[i]))
> + igt_facts_list_add(name, "true", last_test, head);
> +
> + free(name);
> + }
> +
> + igt_facts_list_sweep(head, last_test); }
> +
> +/**
> + * igt_facts:
> + * @last_test: name of the last test
> + *
> + * Call this function where you want to gather and report facts.
> + *
> + * Returns: void
> + */
> +void igt_facts(const char *last_test)
> +{
> + igt_facts_scan_pci_gpus(last_test);
> + igt_facts_scan_pci_drm_cards(last_test);
> + igt_facts_scan_kernel_taints(last_test);
> + igt_facts_scan_kernel_loaded_kmods(last_test);
> +
> + fflush(stdout);
> + fflush(stderr);
> +}
> +
> +/*
> + * Testing
> + *
> + * Defined here to keep most of the functions static
> + *
> + */
> +
> +/**
> + * igt_facts_test_add_get:
> + * @head: head of the list
> + *
> + * Tests igt_facts_list_add and igt_facts_list_get.
> + *
> + * Returns: void
> + */
> +static void igt_facts_test_add_get(struct igt_list_head *head) {
> + igt_fact *fact = NULL;
> + bool ret;
> + const char *name = "hardware.pci.gpu_at_addr.0000:00:02.0";
> + const char *value = "8086:64a0 Intel Lunarlake (Gen20)";
> + const char *last_test = NULL;
> +
> + ret = igt_facts_list_add(name, value, last_test, head);
> + igt_assert(ret == true);
> +
> + // Assert that there is one element in the linked list
> + igt_assert_eq(igt_list_length(head), 1);
> +
> + // Assert that the element in the linked list is the one we added
> + fact = igt_facts_list_get(name, head);
> + igt_assert(fact != NULL);
> + igt_assert_eq(strcmp(fact->name, name), 0);
> + igt_assert_eq(strcmp(fact->value, value), 0);
> + igt_assert(fact->present == true);
> + igt_assert(fact->last_test == NULL);
> +}
> +
> +/**
> + * igt_facts_test_mark_and_sweep:
> + * @head: head of the list
> + *
> + * - Add 3 elements to the list and mark them as not present.
> + * - Update two of the elements and mark them as present.
> + * - Sweep the list and assert that
> + * - Only the two updated elements are present
> + * - The third element was deleted
> + *
> + * Returns: void
> + */
> +static void igt_facts_test_mark_and_sweep(struct igt_list_head *head) {
> + igt_fact *fact = NULL;
> + const char *name1 = "hardware.pci.gpu_at_addr.0000:00:02.0";
> + const char *value1 = "8086:64a0 Intel Lunarlake (Gen20)";
> + const char *name2 = "hardware.pci.gpu_at_addr.0000:00:03.0";
> + const char *value2 = "8086:64a1 Intel Lunarlake (Gen21)";
> + const char *name3 = "hardware.pci.gpu_at_addr.0000:00:04.0";
> + const char *value3 = "8086:64a2 Intel Lunarlake (Gen22)";
> +
> + igt_facts_list_add(name1, value1, NULL, head);
> + igt_facts_list_add(name2, value2, NULL, head);
> + igt_facts_list_add(name3, value3, NULL, head);
> +
> + igt_facts_list_mark(head);
> +
> + igt_facts_list_add(name1, value1, NULL, head);
> + igt_facts_list_add(name2, value2, NULL, head);
> +
> + igt_facts_list_sweep(head, NULL);
> +
> + // Assert that there are two elements in the linked list
> + igt_assert_eq(igt_list_length(head), 2);
> +
> + // Assert that the two updated elements are present
> + fact = igt_facts_list_get(name1, head);
> + igt_assert(fact != NULL);
> + igt_assert(fact->present == true);
> +
> + fact = igt_facts_list_get(name2, head);
> + igt_assert(fact != NULL);
> + igt_assert(fact->present == true);
> +
> + // Assert that the third element was deleted
> + fact = igt_facts_list_get(name3, head);
> + igt_assert(fact == NULL);
> +}
> +
> +/**
> + * igt_facts_test:
> + *
> + * Main function for testing the igt_facts module
> + *
> + * Returns: bool indicating if the tests passed */ void
> +igt_facts_test(void) {
> + const char *last_test = "Unit Testing";
> +
> + igt_facts_lists_init();
> +
> + /* Assert that all lists are empty */
> + igt_assert(igt_list_empty(&igt_facts_list_kmod_head));
> + igt_assert(igt_list_empty(&igt_facts_list_ktaint_head));
> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head));
> + igt_assert(igt_list_empty(&igt_facts_list_drm_card_head));
> +
> + /* Assert that add and get work. Will add one element to the list */
> + igt_facts_test_add_get(&igt_facts_list_pci_gpu_head);
> +
> + /* Assert that igt_facts_list_mark_and_sweep() cleans up the list */
> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == false);
> + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head);
> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == true);
> +
> + /* Test the mark and sweep pattern used to delete elements
> + * from the list
> + */
> + igt_facts_test_mark_and_sweep(&igt_facts_list_pci_gpu_head);
> +
> + /* Clean up the list and call igt_facts(). This should not crash */
> + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head);
> + igt_facts(last_test);
> +}
> diff --git a/lib/igt_facts.h b/lib/igt_facts.h new file mode 100644 index
> 000000000..e4adca3fb
> --- /dev/null
> +++ b/lib/igt_facts.h
> @@ -0,0 +1,47 @@
> +/* SPDX-License-Identifier: MIT
> + * Copyright C 2024 Intel Corporation
> + */
> +
> +#include <stdbool.h>
> +
> +#include "igt_list.h"
> +
> +
> +/* igt_fact:
> + * @name: name of the fact
> + * @value: value of the fact
> + * @last_test: name of the test that triggered the fact
> + * @present: bool indicating if fact is present. Used for deleting
> +facts from
> + * the list.
> + * @link: link to the next fact
> + *
> + * A fact is a piece of information that can be used to determine the
> +state of
> + * the system.
> + *
> + */
> +typedef struct {
> + char *name;
> + char *value;
> + char *last_test;
> + bool present; /* For mark and seep */
Typo: seep -> sweep
With this fixed,
Reviewed-by: Dominik Karol Piątkowski <dominik.karol.piatkowski@intel.com>
> + struct igt_list_head link;
> +} igt_fact;
> +
> +const char *igt_fact_kmod_list[] = {
> + "amdgpu",
> + "i915",
> + "nouveau",
> + "radeon",
> + "xe",
> + "\0"
> +};
> +
> +const char *kmod_fact = "kernel.kmod_is_loaded"; /* true or false */
> +const char *ktaint_fact = "kernel.is_tainted"; /* taint name: taint_warn */
> +const char *pci_gpu_fact = "hardware.pci.gpu_at_addr"; /* id vendor
> +model */ const char *drm_card_fact = "hardware.pci.drm_card_at_addr";
> +/* cardX */
> +
> +void igt_facts_lists_init(void);
> +void igt_facts(const char *last_test);
> +bool igt_facts_are_all_lists_empty(void);
> +void igt_facts_test(void); /* For unit testing only */
> diff --git a/lib/meson.build b/lib/meson.build index c3556a921..c44ca2b5a
> 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -18,6 +18,7 @@ lib_sources = [
> 'i915/i915_crc.c',
> 'igt_collection.c',
> 'igt_color_encoding.c',
> + 'igt_facts.c',
> 'igt_crc.c',
> 'igt_debugfs.c',
> 'igt_device.c',
> diff --git a/lib/tests/igt_facts.c b/lib/tests/igt_facts.c new file mode 100644
> index 000000000..7fa9d0f22
> --- /dev/null
> +++ b/lib/tests/igt_facts.c
> @@ -0,0 +1,15 @@
> +// SPDX-License-Identifier: MIT
> +// Copyright C 2024 Intel Corporation
> +
> +#include <stdbool.h>
> +
> +#include "igt_core.h"
> +#include "igt_facts.h"
> +
> +/* Tests are not defined here so we can keep most of the functions
> +static */
> +
> +igt_simple_main
> +{
> + igt_info("Running igt_facts_test\n");
> + igt_facts_test();
> +}
> diff --git a/lib/tests/meson.build b/lib/tests/meson.build index
> df8092638..1ce19f63c 100644
> --- a/lib/tests/meson.build
> +++ b/lib/tests/meson.build
> @@ -8,6 +8,7 @@ lib_tests = [
> 'igt_dynamic_subtests',
> 'igt_edid',
> 'igt_exit_handler',
> + 'igt_facts',
> 'igt_fork',
> 'igt_fork_helper',
> 'igt_hook',
> diff --git a/runner/executor.c b/runner/executor.c index
> ac73e1dde..d1eca3c05 100644
> --- a/runner/executor.c
> +++ b/runner/executor.c
> @@ -30,6 +30,7 @@
>
> #include "igt_aux.h"
> #include "igt_core.h"
> +#include "igt_facts.h"
> #include "igt_taints.h"
> #include "igt_vec.h"
> #include "executor.h"
> @@ -2306,6 +2307,9 @@ bool execute(struct execute_state *state,
> sigset_t sigmask;
> double time_spent = 0.0;
> bool status = true;
> + char *last_test = NULL;
> +
> + igt_facts_lists_init();
>
> if (state->dry) {
> outf("Dry run, not executing. Invoke igt_resume if you want to
> execute.\n"); @@ -2438,6 +2442,10 @@ bool execute(struct execute_state
> *state,
> int result;
> bool already_written = false;
>
> + /* Calls before running each test */
> + igt_facts(last_test);
> + last_test = entry_display_name(&job_list->entries[state-
> >next]);
> +
> if (should_die_because_signal(sigfd)) {
> status = false;
> goto end;
> @@ -2526,6 +2534,8 @@ bool execute(struct execute_state *state,
> return execute(state, settings, job_list);
> }
> }
> + /* Last call to collect facts after the last test runs */
> + igt_facts(last_test);
>
> if ((timefd = openat(resdirfd, "endtime.txt", O_CREAT | O_WRONLY |
> O_EXCL, 0666)) >= 0) {
> dprintf(timefd, "%f\n", timeofday_double()); diff --git
> a/tools/lsfacts.c b/tools/lsfacts.c new file mode 100644 index
> 000000000..10dee0317
> --- /dev/null
> +++ b/tools/lsfacts.c
> @@ -0,0 +1,25 @@
> +// SPDX-License-Identifier: MIT
> +// Copyright C 2024 Intel Corporation
> +
> +#include "igt.h"
> +#include "igt_facts.h"
> +
> +/**
> + * SECTION:lsfacts
> + * @short_description: lsfacts
> + * @title: lsfacts
> + * @include: lsfacts.c
> + *
> + * # lsfacts
> + *
> + * Scan for igt-facts and print them on screen. Indicate if no facts are found.
> + */
> +int main(int argc, char *argv[])
> +{
> + igt_facts_lists_init();
> +
> + igt_facts("lsfacts");
> +
> + if (igt_facts_are_all_lists_empty())
> + igt_info("No facts found...\n");
> +}
> diff --git a/tools/meson.build b/tools/meson.build index
> 48c9a4b50..ff1b0ef90 100644
> --- a/tools/meson.build
> +++ b/tools/meson.build
> @@ -42,6 +42,7 @@ tools_progs = [
> 'intel_gem_info',
> 'intel_gvtg_test',
> 'dpcd_reg',
> + 'lsfacts',
> 'lsgpu',
> 'power',
> ]
> --
> 2.34.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking
[not found] <5a111c35-7245-4ada-a2a0-3fd0fd5bbeab@linux.intel.com>
@ 2024-12-05 14:05 ` Janusz Krzysztofik
2024-12-06 5:45 ` Peter Senna Tschudin
0 siblings, 1 reply; 13+ messages in thread
From: Janusz Krzysztofik @ 2024-12-05 14:05 UTC (permalink / raw)
To: igt-dev@lists.freedesktop.org, Peter Senna Tschudin
Cc: Ryszard Knop, Zbigniew Kempczyński, Lucas De Marchi,
luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray,
dominik.karol.piatkowski, katarzyna.piecielska,
Janusz Krzysztofik
Hi Peter,
On Wednesday, 4 December 2024 19:44:53 CET Peter Senna Tschudin wrote:
> When using igt-runner, collect facts before each test and after the
> last test, and report when facts change. The facts are:
> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20)
> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1
> - Kernel taints: kernel.is_tainted.taint_warn: true
> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true
>
> This change imposes little execution overhead and adds just a few
> lines of logging. The facts will be printed on normal igt-runner
> output. Here is a real example from our CI shwoing
> hotreplug-lateclose changing the DRM card number
Since you give that as an example of how helpful your facts can be, and follow
that with a kernel taint example, that may indicate you think, and users of
your facts may then be mislead having that read, that the taint was related to
the change of card number, while both had nothing to do with each other.
Please add something like ', which is expected,' to your description. Changed
card number is expected, and that's nothing wrong with that. The old, still
open instance of the driver still exists. It is expected to be already
decoupled from hardware, but it still occupies its minor device number. Then,
new instance of the driver, attached to the same hardware, gets first free
minor number. Refresh of IGT device filter before health check can perfectly
handle that case.
> and tainting the
> kernel on the abort path:
No, hotreplug-lateclose doesn't taint the kernel. That's wrong conclusion from
what was reported, or wrong wording at least. Kernel taint (or lockdep not
active) must have been a result of driver issues (or maybe well known
limitations) exposed by the test. Then igt_runner took the abort path since
it detected those unhealthy conditions. Again, users of your facts may be
mislead if your messages can really suggest what you tell us about what you
think has happened.
The whole idea of testing i915 resistance to hot unplug scenarios came from
perceived cases of VFs potentially disappearing from VMs. Some significant
effort was put on that feature of the driver, but since the 'hot' test
variants were never unblocked in CI, things tended to get worse and worse over
time while new driver features that didn't care for hot unplug capability were
added.
Thanks,
Janusz
>
> [245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose)
> [245.383596] Starting subtest: hotreplug-lateclose
> [249.843361] Aborting: Lockdep not active
> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1
> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true
> [249.859075] Closing watchdogs
>
> CC: Ryszard Knop <ryszard.knop@intel.com>
> CC: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> CC: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> CC: Lucas De Marchi <lucas.demarchi@intel.com>
> CC: luciano.coelho@intel.com
> CC: nirmoy.das@intel.com
> CC: stuart.summers@intel.com
> CC: himal.prasad.ghimiray@intel.com
> CC: dominik.karol.piatkowski@intel.com
> CC: katarzyna.piecielska@intel.com
> Signed-off-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
> ---
> v10:
> - fix memory leaks from asprintf (Thank you Dominik Karol!)
> - fix comments for consistency (Thank you Dominik Karol!)
>
> v9:
> - do not report new hardware when loading/unloading kmod changes
> the string of the GPU name. I accidentally reintroduced this
> issue when refactoring to use linked lists.
> - add tools/lsfacts: 9 lines of code that print either the facts
> or that no facts were found.
> - fix code comments describing functions
> - fix white space issues
>
> v8:
> - fix white space issues
>
> v7:
> - refactor to use linked lists provided by igt_lists
> - Added function arguments to code comments
> - updated commit message
>
> v6:
> - sort includes in igt_facts.c alphabetically
> - add facts for kernel taints using igt_kernel_tainted() and
> igt_explain_taints()
>
> v5:
> - fix the broken patch format from v4
>
> v4:
> - fix a bug on delete_fact()
> - drop glib and calls to g_ functions
> - change commit message to indicate that report only on fact changes
> - use consistent format for reporting changes
> - fix SPDX header format
>
> v3:
> - refreshed commit message
> - changed format SPDX string
> - removed license text
> - replace last_test assignment when null by two ternary operators
> - added function descriptions following example found elsewhere in
> the code
> - added igt_assert to catch failures to realloc()
>
> v2:
> - add lib/tests/igt_facts.c for basic unit testing
> - bugfix: do not report a new gpu when the driver changes the gpu name
> - bugfix: do not report the pci_id twice on the gpu name
>
> lib/igt_facts.c | 755 ++++++++++++++++++++++++++++++++++++++++++
> lib/igt_facts.h | 47 +++
> lib/meson.build | 1 +
> lib/tests/igt_facts.c | 15 +
> lib/tests/meson.build | 1 +
> runner/executor.c | 10 +
> tools/lsfacts.c | 25 ++
> tools/meson.build | 1 +
> 8 files changed, 855 insertions(+)
> create mode 100644 lib/igt_facts.c
> create mode 100644 lib/igt_facts.h
> create mode 100644 lib/tests/igt_facts.c
> create mode 100644 tools/lsfacts.c
>
> diff --git a/lib/igt_facts.c b/lib/igt_facts.c
> new file mode 100644
> index 000000000..4749d3417
> --- /dev/null
> +++ b/lib/igt_facts.c
> @@ -0,0 +1,755 @@
> +// SPDX-License-Identifier: MIT
> +// Copyright © 2024 Intel Corporation
> +
> +#include <ctype.h>
> +#include <libudev.h>
> +#include <stdio.h>
> +#include <sys/time.h>
> +#include <time.h>
> +
> +#include "igt_core.h"
> +#include "igt_device_scan.h"
> +#include "igt_facts.h"
> +#include "igt_kmod.h"
> +#include "igt_list.h"
> +#include "igt_taints.h"
> +
> +static struct igt_list_head igt_facts_list_drm_card_head;
> +static struct igt_list_head igt_facts_list_kmod_head;
> +static struct igt_list_head igt_facts_list_ktaint_head;
> +static struct igt_list_head igt_facts_list_pci_gpu_head;
> +
> +
> +/**
> + * igt_facts_lists_init:
> + *
> + * Initialize igt_facts linked lists.
> + *
> + * Returns: void
> + */
> +void igt_facts_lists_init(void)
> +{
> + IGT_INIT_LIST_HEAD(&igt_facts_list_drm_card_head);
> + IGT_INIT_LIST_HEAD(&igt_facts_list_kmod_head);
> + IGT_INIT_LIST_HEAD(&igt_facts_list_ktaint_head);
> + IGT_INIT_LIST_HEAD(&igt_facts_list_pci_gpu_head);
> +}
> +
> +
> +/**
> + * igt_facts_log:
> + * @last_test: name of the test that triggered the fact
> + * @name: name of the fact
> + * @new_value: new value of the fact
> + * @old_value: old value of the fact
> + *
> + * Reports fact changes:
> + * - new fact: if old_value is NULL and new_value is not NULL
> + * - deleted fact: if new_value is NULL and old_value is not NULL
> + * - changed fact: if new_value is different from old_value
> + *
> + * Returns: void
> + */
> +static void igt_facts_log(const char *last_test, const char *name,
> + const char *new_value, const char *old_value)
> +{
> + struct timespec uptime_ts;
> + char *uptime = NULL;
> + const char *before_tests = "before any test";
> +
> + if (old_value == NULL && new_value == NULL)
> + return;
> +
> + if (clock_gettime(CLOCK_BOOTTIME, &uptime_ts) != 0)
> + return;
> +
> + asprintf(&uptime,
> + "%ld.%06ld",
> + uptime_ts.tv_sec,
> + uptime_ts.tv_nsec / 1000);
> +
> + /* New fact */
> + if (old_value == NULL && new_value != NULL) {
> + igt_info("[%s] [FACT %s] new: %s: %s\n",
> + uptime,
> + last_test ? last_test : before_tests,
> + name,
> + new_value);
> + goto out;
> + }
> +
> + /* Update fact */
> + if (old_value != NULL && new_value != NULL) {
> + igt_info("[%s] [FACT %s] changed: %s: %s -> %s\n",
> + uptime,
> + last_test ? last_test : before_tests,
> + name,
> + old_value,
> + new_value);
> + goto out;
> + }
> +
> + /* Deleted fact */
> + if (old_value != NULL && new_value == NULL) {
> + igt_info("[%s] [FACT %s] deleted: %s: %s\n",
> + uptime,
> + last_test ? last_test : before_tests,
> + name,
> + old_value);
> + goto out;
> + }
> +
> +out:
> + free(uptime);
> +}
> +
> +/**
> + * igt_facts_list_get:
> + * @name: name of the fact to be added
> + * @head: head of the list
> + *
> + * Get a fact from the list.
> + *
> + * Returns: pointer to the fact if found, NULL otherwise
> + *
> + */
> +static igt_fact *igt_facts_list_get(const char *name,
> + struct igt_list_head *head)
> +{
> + igt_fact *fact = NULL;
> +
> + if (igt_list_empty(head))
> + return NULL;
> +
> + igt_list_for_each_entry(fact, head, link) {
> + if (strcmp(fact->name, name) == 0)
> + return fact;
> + }
> + return NULL;
> +}
> +
> +/**
> + * igt_facts_list_del:
> + * @name: name of the fact to be added
> + * @head: head of the list
> + * @last_test: name of the last test
> + * @log: bool indicating if the delete operation should be logged
> + *
> + * Delete a fact from the list.
> + *
> + * Returns: bool indicating if fact was deleted from the list
> + *
> + */
> +static bool igt_facts_list_del(const char *name,
> + struct igt_list_head *head,
> + const char *last_test,
> + bool log)
> +{
> + igt_fact *fact = NULL;
> +
> + if (igt_list_empty(head))
> + return false;
> +
> + igt_list_for_each_entry(fact, head, link) {
> + if (strcmp(fact->name, name) == 0) {
> + if (log)
> + igt_facts_log(last_test, fact->name,
> + NULL, fact->value);
> +
> + igt_list_del(&fact->link);
> + free(fact->name);
> + free(fact->value);
> + free(fact->last_test);
> + free(fact);
> + return true;
> + }
> + }
> + return false;
> +}
> +
> +/**
> + * igt_facts_list_add:
> + * @name: name of the fact to be added
> + * @value: value of the fact to be added
> + * @last_test: name of the last test
> + * @head: head of the list
> + *
> + * Returns: bool indicating if fact was added to the list
> + *
> + */
> +static bool igt_facts_list_add(const char *name,
> + const char *value,
> + const char *last_test,
> + struct igt_list_head *head)
> +{
> + igt_fact *new_fact = NULL, *old_fact = NULL;
> + bool logged = false;
> +
> + if (name == NULL || value == NULL)
> + return false;
> +
> + old_fact = igt_facts_list_get(name, head);
> + if (old_fact) {
> + if (strcmp(old_fact->value, value) == 0) {
> + old_fact->present = true;
> + return false;
> + }
> + igt_facts_log(last_test, name, value, old_fact->value);
> + logged = true;
> + igt_facts_list_del(name, head, last_test, false);
> + }
> +
> + new_fact = malloc(sizeof(igt_fact));
> + if (new_fact == NULL)
> + return false;
> +
> + new_fact->name = strdup(name);
> + new_fact->value = strdup(value);
> + new_fact->last_test = last_test ? strdup(last_test) : NULL;
> + new_fact->present = true;
> +
> + if (!logged)
> + igt_facts_log(last_test, name, value, NULL);
> +
> + igt_list_add(&new_fact->link, head);
> +
> + return true;
> +}
> +
> +/**
> + * igt_facts_list_mark:
> + * @head: head of the list
> + *
> + * Mark all facts in the list as not present. Opted for the mark and sweep
> + * design pattern due to its simplicity and efficiency.
> + *
> + * Returns: void
> + */
> +static void igt_facts_list_mark(struct igt_list_head *head)
> +{
> + igt_fact *fact = NULL;
> +
> + if (igt_list_empty(head))
> + return;
> +
> + igt_list_for_each_entry(fact, head, link)
> + fact->present = false;
> +}
> +
> +/**
> + * igt_facts_list_sweep:
> + * @head: head of the list
> + * @last_test: name of the last test
> + *
> + * Sweep the list and delete all facts that are not present. Opted for the mark
> + * and sweep design pattern due to its simplicity and efficiency.
> + *
> + * Returns: void
> + */
> +static void igt_facts_list_sweep(struct igt_list_head *head,
> + const char *last_test)
> +{
> + igt_fact *fact = NULL, *tmp = NULL;
> +
> + if (igt_list_empty(head))
> + return;
> +
> + igt_list_for_each_entry_safe(fact, tmp, head, link)
> + if (!fact->present)
> + igt_facts_list_del(fact->name, head, last_test, true);
> +}
> +
> +/**
> + * igt_facts_list_mark_and_sweep:
> + * @head: head of the list
> + *
> + * Clean up the list using mark and sweep. Opted for the mark and sweep
> + * design pattern due to its simplicity and efficiency.
> + *
> + * Returns: void
> + */
> +static void igt_facts_list_mark_and_sweep(struct igt_list_head *head)
> +{
> + igt_facts_list_mark(head);
> + igt_facts_list_sweep(head, NULL);
> +}
> +
> +/**
> + * igt_facts_are_all_lists_empty:
> + *
> + * Returns true if all lists are empty. Used by the tool lsfacts.
> + *
> + * Returns: bool
> + */
> +bool igt_facts_are_all_lists_empty(void)
> +{
> + return igt_list_empty(&igt_facts_list_drm_card_head) &&
> + igt_list_empty(&igt_facts_list_kmod_head) &&
> + igt_list_empty(&igt_facts_list_ktaint_head) &&
> + igt_list_empty(&igt_facts_list_pci_gpu_head);
> +}
> +
> +/**
> + * igt_facts_scan_pci_gpus:
> + * @last_test: name of the last test
> + *
> + * This function scans the pci bus for gpus using udev. It uses
> + * igt_facts_list_mark(), igt_facts_list_add() and igt_facts_list_sweep() to
> + * update igt_facts_list_pci_gpu_head.
> + *
> + * Returns: void
> + */
> +static void igt_facts_scan_pci_gpus(const char *last_test)
> +{
> + static struct igt_list_head *head = &igt_facts_list_pci_gpu_head;
> + struct udev *udev = NULL;
> + struct udev_enumerate *enumerate = NULL;
> + struct udev_list_entry *devices, *dev_list_entry;
> + struct igt_device_card card;
> + char pcistr[10];
> + int ret;
> + char *factname = NULL;
> + char *factvalue = NULL;
> +
> + udev = udev_new();
> + if (!udev) {
> + igt_warn("Failed to create udev context\n");
> + return;
> + }
> +
> + enumerate = udev_enumerate_new(udev);
> + if (!enumerate) {
> + igt_warn("Failed to create udev enumerate\n");
> + udev_unref(udev);
> + return;
> + }
> +
> + ret = udev_enumerate_add_match_subsystem(enumerate, "pci");
> + if (ret < 0)
> + goto out;
> +
> + ret = udev_enumerate_add_match_property(enumerate,
> + "PCI_CLASS",
> + "30000");
> + if (ret < 0)
> + goto out;
> +
> + ret = udev_enumerate_add_match_property(enumerate,
> + "PCI_CLASS",
> + "38000");
> + if (ret < 0)
> + goto out;
> +
> + ret = udev_enumerate_scan_devices(enumerate);
> + if (ret < 0)
> + goto out;
> +
> + devices = udev_enumerate_get_list_entry(enumerate);
> + if (!devices)
> + goto out;
> +
> + igt_facts_list_mark(head);
> +
> + udev_list_entry_foreach(dev_list_entry, devices) {
> + const char *path;
> + struct udev_device *udev_dev;
> + struct udev_list_entry *entry;
> + char *model = NULL;
> + char *codename = NULL;
> + igt_fact *old_fact = NULL;
> +
> + path = udev_list_entry_get_name(dev_list_entry);
> + udev_dev = udev_device_new_from_syspath(udev, path);
> + if (!udev_dev)
> + continue;
> +
> + /* Strip path to only the content after the last / */
> + path = strrchr(path, '/');
> + if (path)
> + path++;
> + else
> + path = "unknown";
> +
> + strcpy(card.pci_slot_name, "-");
> +
> + entry = udev_device_get_properties_list_entry(udev_dev);
> + while (entry) {
> + const char *name = udev_list_entry_get_name(entry);
> + const char *value = udev_list_entry_get_value(entry);
> +
> + entry = udev_list_entry_get_next(entry);
> + if (!strcmp(name, "ID_MODEL_FROM_DATABASE"))
> + model = strdup(value);
> + else if (!strcmp(name, "PCI_ID"))
> + igt_assert_eq(sscanf(value, "%hx:%hx",
> + &card.pci_vendor,
> + &card.pci_device), 2);
> + }
> + snprintf(pcistr, sizeof(pcistr), "%04x:%04x",
> + card.pci_vendor, card.pci_device);
> + codename = igt_device_get_pretty_name(&card, false);
> +
> + /* Set codename to null if it is the same string as pci_id */
> + if (codename && strcmp(pcistr, codename) == 0) {
> + free(codename);
> + codename = NULL;
> + }
> + asprintf(&factname, "%s.%s", pci_gpu_fact, path);
> + asprintf(&factvalue,
> + "%s %s %s",
> + pcistr,
> + codename ? codename : "",
> + model ? model : "");
> +
> + /**
> + * Loading and unloading the kmod may change the human
> + * readeable string in value. Do not change value if the
> + * pci id is the same.
> + */
> + old_fact = igt_facts_list_get(factname, head);
> + if (old_fact && strncmp(old_fact->value, factvalue, 9) == 0)
> + old_fact->present = true;
> + else
> + igt_facts_list_add(factname, factvalue, last_test, head);
> +
> + free(codename);
> + free(model);
> + free(factname);
> + free(factvalue);
> + udev_device_unref(udev_dev);
> + }
> +
> + igt_facts_list_sweep(head, last_test);
> +
> +out:
> + udev_enumerate_unref(enumerate);
> + udev_unref(udev);
> +}
> +
> +/**
> + * igt_facts_scan_pci_drm_cards:
> + * @last_test: name of the last test
> + *
> + * This function scans the pci bus for drm cards using udev. It uses the
> + * igt_facts_list_mark(), igt_facts_list_add() and igt_facts_list_sweep() to
> + * update igt_facts_list_drm_card_head.
> + *
> + * Returns: void
> + */
> +static void igt_facts_scan_pci_drm_cards(const char *last_test)
> +{
> + static struct igt_list_head *head = &igt_facts_list_drm_card_head;
> + struct udev *udev = NULL;
> + struct udev_enumerate *enumerate = NULL;
> + struct udev_list_entry *devices, *dev_list_entry;
> + int ret;
> + char *factname = NULL;
> + char *factvalue = NULL;
> +
> + udev = udev_new();
> + if (!udev)
> + return;
> +
> + enumerate = udev_enumerate_new(udev);
> + if (!enumerate) {
> + udev_unref(udev);
> + return;
> + }
> +
> + ret = udev_enumerate_add_match_subsystem(enumerate, "drm");
> + if (ret < 0)
> + goto out;
> +
> + ret = udev_enumerate_scan_devices(enumerate);
> + if (ret < 0)
> + goto out;
> +
> + devices = udev_enumerate_get_list_entry(enumerate);
> + if (!devices)
> + goto out;
> +
> + ret = udev_enumerate_add_match_subsystem(enumerate, "drm");
> + if (ret < 0)
> + goto out;
> +
> + ret = udev_enumerate_scan_devices(enumerate);
> + if (ret < 0)
> + goto out;
> +
> + devices = udev_enumerate_get_list_entry(enumerate);
> + if (!devices)
> + goto out;
> +
> + igt_facts_list_mark(head);
> +
> + udev_list_entry_foreach(dev_list_entry, devices) {
> + const char *path;
> + struct udev_device *drm_dev, *pci_dev;
> + const char *drm_name, *pci_addr;
> +
> + path = udev_list_entry_get_name(dev_list_entry);
> + drm_dev = udev_device_new_from_syspath(udev, path);
> + if (!drm_dev)
> + continue;
> +
> + drm_name = udev_device_get_sysname(drm_dev);
> + /* Filter the device by name. Want devices such as card0 and card1.
> + * If the device has '-' in the name, contine
> + */
> + if (strncmp(drm_name, "card", 4) != 0 ||
> + strchr(drm_name, '-') != NULL) {
> + udev_device_unref(drm_dev);
> + continue;
> + }
> +
> + /* Get the pci address of the gpu associated with the drm_dev*/
> + pci_dev = udev_device_get_parent_with_subsystem_devtype(drm_dev,
> + "pci",
> + NULL);
> + if (pci_dev) {
> + pci_addr = udev_device_get_sysattr_value(pci_dev,
> + "address");
> + if (!pci_addr)
> + pci_addr = udev_device_get_sysname(pci_dev);
> + } else {
> + /* Some GPUs are platform devices. Ignore them. */
> + pci_addr = NULL;
> + udev_device_unref(drm_dev);
> + continue;
> + }
> +
> + asprintf(&factname, "%s.%s", drm_card_fact, pci_addr);
> + asprintf(&factvalue, "%s", drm_name);
> +
> + igt_facts_list_add(factname, factvalue, last_test, head);
> +
> + free(factname);
> + free(factvalue);
> + udev_device_unref(drm_dev);
> + }
> +
> + igt_facts_list_sweep(head, last_test);
> +
> +out:
> + udev_enumerate_unref(enumerate);
> + udev_unref(udev);
> +}
> +
> +/**
> + * igt_facts_scan_kernel_taints:
> + * @last_test: name of the last test
> + *
> + * This function scans for kernel taints using igt_kernel_tainted() and
> + * igt_explain_taints(). It will cut off the explanation keeping only the
> + * taint name.
> + *
> + * Returns: void
> + */
> +static void igt_facts_scan_kernel_taints(const char *last_test)
> +{
> + static struct igt_list_head *head = &igt_facts_list_ktaint_head;
> + unsigned long taints = 0;
> + const char *reason = NULL;
> + char *taint_name = NULL;
> + char *fact_name = NULL;
> +
> + taints = igt_kernel_tainted(&taints);
> + /* For testing, set all bits to 1
> + * taints = 0xFFFFFFFF;
> + */
> +
> +
> + igt_facts_list_mark(head);
> +
> + while ((reason = igt_explain_taints(&taints)) != NULL) {
> + /* Cut at the ':' to get only the taint name */
> + taint_name = strtok(strdup(reason), ":");
> + if (!taint_name)
> + continue;
> +
> + /* Lowercase taint_name */
> + for (int i = 0; taint_name[i]; i++)
> + taint_name[i] = tolower(taint_name[i]);
> +
> + asprintf(&fact_name, "%s.%s", ktaint_fact, taint_name);
> + igt_facts_list_add(fact_name, "true", last_test, head);
> +
> + free(taint_name);
> + free(fact_name);
> + }
> +
> + igt_facts_list_sweep(head, last_test);
> +}
> +
> +
> +/**
> + * igt_facts_scan_kernel_loaded_kmods:
> + * @last_test: name of the last test
> + *
> + * This function scans for loaded kmods using igt_fact_kmod_list and
> + * igt_kmod_is_loaded().
> + *
> + * Returns: void
> + */
> +static void igt_facts_scan_kernel_loaded_kmods(const char *last_test)
> +{
> + static struct igt_list_head *head = &igt_facts_list_kmod_head;
> + char *name = NULL;
> +
> + igt_facts_list_mark(head);
> +
> + /* Iterate over igt_fact_kmod_list[] until the element contains "\0" */
> + for (int i = 0; strcmp(igt_fact_kmod_list[i], "\0") != 0; i++) {
> + asprintf(&name, "%s.%s", kmod_fact, igt_fact_kmod_list[i]);
> + if (igt_kmod_is_loaded(igt_fact_kmod_list[i]))
> + igt_facts_list_add(name, "true", last_test, head);
> +
> + free(name);
> + }
> +
> + igt_facts_list_sweep(head, last_test);
> +}
> +
> +/**
> + * igt_facts:
> + * @last_test: name of the last test
> + *
> + * Call this function where you want to gather and report facts.
> + *
> + * Returns: void
> + */
> +void igt_facts(const char *last_test)
> +{
> + igt_facts_scan_pci_gpus(last_test);
> + igt_facts_scan_pci_drm_cards(last_test);
> + igt_facts_scan_kernel_taints(last_test);
> + igt_facts_scan_kernel_loaded_kmods(last_test);
> +
> + fflush(stdout);
> + fflush(stderr);
> +}
> +
> +/*
> + * Testing
> + *
> + * Defined here to keep most of the functions static
> + *
> + */
> +
> +/**
> + * igt_facts_test_add_get:
> + * @head: head of the list
> + *
> + * Tests igt_facts_list_add and igt_facts_list_get.
> + *
> + * Returns: void
> + */
> +static void igt_facts_test_add_get(struct igt_list_head *head)
> +{
> + igt_fact *fact = NULL;
> + bool ret;
> + const char *name = "hardware.pci.gpu_at_addr.0000:00:02.0";
> + const char *value = "8086:64a0 Intel Lunarlake (Gen20)";
> + const char *last_test = NULL;
> +
> + ret = igt_facts_list_add(name, value, last_test, head);
> + igt_assert(ret == true);
> +
> + // Assert that there is one element in the linked list
> + igt_assert_eq(igt_list_length(head), 1);
> +
> + // Assert that the element in the linked list is the one we added
> + fact = igt_facts_list_get(name, head);
> + igt_assert(fact != NULL);
> + igt_assert_eq(strcmp(fact->name, name), 0);
> + igt_assert_eq(strcmp(fact->value, value), 0);
> + igt_assert(fact->present == true);
> + igt_assert(fact->last_test == NULL);
> +}
> +
> +/**
> + * igt_facts_test_mark_and_sweep:
> + * @head: head of the list
> + *
> + * - Add 3 elements to the list and mark them as not present.
> + * - Update two of the elements and mark them as present.
> + * - Sweep the list and assert that
> + * - Only the two updated elements are present
> + * - The third element was deleted
> + *
> + * Returns: void
> + */
> +static void igt_facts_test_mark_and_sweep(struct igt_list_head *head)
> +{
> + igt_fact *fact = NULL;
> + const char *name1 = "hardware.pci.gpu_at_addr.0000:00:02.0";
> + const char *value1 = "8086:64a0 Intel Lunarlake (Gen20)";
> + const char *name2 = "hardware.pci.gpu_at_addr.0000:00:03.0";
> + const char *value2 = "8086:64a1 Intel Lunarlake (Gen21)";
> + const char *name3 = "hardware.pci.gpu_at_addr.0000:00:04.0";
> + const char *value3 = "8086:64a2 Intel Lunarlake (Gen22)";
> +
> + igt_facts_list_add(name1, value1, NULL, head);
> + igt_facts_list_add(name2, value2, NULL, head);
> + igt_facts_list_add(name3, value3, NULL, head);
> +
> + igt_facts_list_mark(head);
> +
> + igt_facts_list_add(name1, value1, NULL, head);
> + igt_facts_list_add(name2, value2, NULL, head);
> +
> + igt_facts_list_sweep(head, NULL);
> +
> + // Assert that there are two elements in the linked list
> + igt_assert_eq(igt_list_length(head), 2);
> +
> + // Assert that the two updated elements are present
> + fact = igt_facts_list_get(name1, head);
> + igt_assert(fact != NULL);
> + igt_assert(fact->present == true);
> +
> + fact = igt_facts_list_get(name2, head);
> + igt_assert(fact != NULL);
> + igt_assert(fact->present == true);
> +
> + // Assert that the third element was deleted
> + fact = igt_facts_list_get(name3, head);
> + igt_assert(fact == NULL);
> +}
> +
> +/**
> + * igt_facts_test:
> + *
> + * Main function for testing the igt_facts module
> + *
> + * Returns: bool indicating if the tests passed
> + */
> +void igt_facts_test(void)
> +{
> + const char *last_test = "Unit Testing";
> +
> + igt_facts_lists_init();
> +
> + /* Assert that all lists are empty */
> + igt_assert(igt_list_empty(&igt_facts_list_kmod_head));
> + igt_assert(igt_list_empty(&igt_facts_list_ktaint_head));
> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head));
> + igt_assert(igt_list_empty(&igt_facts_list_drm_card_head));
> +
> + /* Assert that add and get work. Will add one element to the list */
> + igt_facts_test_add_get(&igt_facts_list_pci_gpu_head);
> +
> + /* Assert that igt_facts_list_mark_and_sweep() cleans up the list */
> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == false);
> + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head);
> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == true);
> +
> + /* Test the mark and sweep pattern used to delete elements
> + * from the list
> + */
> + igt_facts_test_mark_and_sweep(&igt_facts_list_pci_gpu_head);
> +
> + /* Clean up the list and call igt_facts(). This should not crash */
> + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head);
> + igt_facts(last_test);
> +}
> diff --git a/lib/igt_facts.h b/lib/igt_facts.h
> new file mode 100644
> index 000000000..e4adca3fb
> --- /dev/null
> +++ b/lib/igt_facts.h
> @@ -0,0 +1,47 @@
> +/* SPDX-License-Identifier: MIT
> + * Copyright © 2024 Intel Corporation
> + */
> +
> +#include <stdbool.h>
> +
> +#include "igt_list.h"
> +
> +
> +/* igt_fact:
> + * @name: name of the fact
> + * @value: value of the fact
> + * @last_test: name of the test that triggered the fact
> + * @present: bool indicating if fact is present. Used for deleting facts from
> + * the list.
> + * @link: link to the next fact
> + *
> + * A fact is a piece of information that can be used to determine the state of
> + * the system.
> + *
> + */
> +typedef struct {
> + char *name;
> + char *value;
> + char *last_test;
> + bool present; /* For mark and seep */
> + struct igt_list_head link;
> +} igt_fact;
> +
> +const char *igt_fact_kmod_list[] = {
> + "amdgpu",
> + "i915",
> + "nouveau",
> + "radeon",
> + "xe",
> + "\0"
> +};
> +
> +const char *kmod_fact = "kernel.kmod_is_loaded"; /* true or false */
> +const char *ktaint_fact = "kernel.is_tainted"; /* taint name: taint_warn */
> +const char *pci_gpu_fact = "hardware.pci.gpu_at_addr"; /* id vendor model */
> +const char *drm_card_fact = "hardware.pci.drm_card_at_addr"; /* cardX */
> +
> +void igt_facts_lists_init(void);
> +void igt_facts(const char *last_test);
> +bool igt_facts_are_all_lists_empty(void);
> +void igt_facts_test(void); /* For unit testing only */
> diff --git a/lib/meson.build b/lib/meson.build
> index c3556a921..c44ca2b5a 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -18,6 +18,7 @@ lib_sources = [
> 'i915/i915_crc.c',
> 'igt_collection.c',
> 'igt_color_encoding.c',
> + 'igt_facts.c',
> 'igt_crc.c',
> 'igt_debugfs.c',
> 'igt_device.c',
> diff --git a/lib/tests/igt_facts.c b/lib/tests/igt_facts.c
> new file mode 100644
> index 000000000..7fa9d0f22
> --- /dev/null
> +++ b/lib/tests/igt_facts.c
> @@ -0,0 +1,15 @@
> +// SPDX-License-Identifier: MIT
> +// Copyright © 2024 Intel Corporation
> +
> +#include <stdbool.h>
> +
> +#include "igt_core.h"
> +#include "igt_facts.h"
> +
> +/* Tests are not defined here so we can keep most of the functions static */
> +
> +igt_simple_main
> +{
> + igt_info("Running igt_facts_test\n");
> + igt_facts_test();
> +}
> diff --git a/lib/tests/meson.build b/lib/tests/meson.build
> index df8092638..1ce19f63c 100644
> --- a/lib/tests/meson.build
> +++ b/lib/tests/meson.build
> @@ -8,6 +8,7 @@ lib_tests = [
> 'igt_dynamic_subtests',
> 'igt_edid',
> 'igt_exit_handler',
> + 'igt_facts',
> 'igt_fork',
> 'igt_fork_helper',
> 'igt_hook',
> diff --git a/runner/executor.c b/runner/executor.c
> index ac73e1dde..d1eca3c05 100644
> --- a/runner/executor.c
> +++ b/runner/executor.c
> @@ -30,6 +30,7 @@
>
> #include "igt_aux.h"
> #include "igt_core.h"
> +#include "igt_facts.h"
> #include "igt_taints.h"
> #include "igt_vec.h"
> #include "executor.h"
> @@ -2306,6 +2307,9 @@ bool execute(struct execute_state *state,
> sigset_t sigmask;
> double time_spent = 0.0;
> bool status = true;
> + char *last_test = NULL;
> +
> + igt_facts_lists_init();
>
> if (state->dry) {
> outf("Dry run, not executing. Invoke igt_resume if you want to execute.\n");
> @@ -2438,6 +2442,10 @@ bool execute(struct execute_state *state,
> int result;
> bool already_written = false;
>
> + /* Calls before running each test */
> + igt_facts(last_test);
> + last_test = entry_display_name(&job_list->entries[state->next]);
> +
> if (should_die_because_signal(sigfd)) {
> status = false;
> goto end;
> @@ -2526,6 +2534,8 @@ bool execute(struct execute_state *state,
> return execute(state, settings, job_list);
> }
> }
> + /* Last call to collect facts after the last test runs */
> + igt_facts(last_test);
>
> if ((timefd = openat(resdirfd, "endtime.txt", O_CREAT | O_WRONLY | O_EXCL, 0666)) >= 0) {
> dprintf(timefd, "%f\n", timeofday_double());
> diff --git a/tools/lsfacts.c b/tools/lsfacts.c
> new file mode 100644
> index 000000000..10dee0317
> --- /dev/null
> +++ b/tools/lsfacts.c
> @@ -0,0 +1,25 @@
> +// SPDX-License-Identifier: MIT
> +// Copyright © 2024 Intel Corporation
> +
> +#include "igt.h"
> +#include "igt_facts.h"
> +
> +/**
> + * SECTION:lsfacts
> + * @short_description: lsfacts
> + * @title: lsfacts
> + * @include: lsfacts.c
> + *
> + * # lsfacts
> + *
> + * Scan for igt-facts and print them on screen. Indicate if no facts are found.
> + */
> +int main(int argc, char *argv[])
> +{
> + igt_facts_lists_init();
> +
> + igt_facts("lsfacts");
> +
> + if (igt_facts_are_all_lists_empty())
> + igt_info("No facts found...\n");
> +}
> diff --git a/tools/meson.build b/tools/meson.build
> index 48c9a4b50..ff1b0ef90 100644
> --- a/tools/meson.build
> +++ b/tools/meson.build
> @@ -42,6 +42,7 @@ tools_progs = [
> 'intel_gem_info',
> 'intel_gvtg_test',
> 'dpcd_reg',
> + 'lsfacts',
> 'lsgpu',
> 'power',
> ]
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking
2024-12-05 14:05 ` [PATCH i-g-t v10] igt-runner fact checking Janusz Krzysztofik
@ 2024-12-06 5:45 ` Peter Senna Tschudin
2024-12-09 9:17 ` Janusz Krzysztofik
0 siblings, 1 reply; 13+ messages in thread
From: Peter Senna Tschudin @ 2024-12-06 5:45 UTC (permalink / raw)
To: Janusz Krzysztofik, igt-dev@lists.freedesktop.org
Cc: Ryszard Knop, Zbigniew Kempczyński, Lucas De Marchi,
luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray,
dominik.karol.piatkowski, katarzyna.piecielska
Hi Janusz,
Thank you for your detailed comments. I appreciate the opportunity
to clarify and address your concerns.
On 05.12.2024 15:05, Janusz Krzysztofik wrote:
> Hi Peter,
>
> On Wednesday, 4 December 2024 19:44:53 CET Peter Senna Tschudin wrote:
>> When using igt-runner, collect facts before each test and after the
>> last test, and report when facts change. The facts are:
>> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20)
>> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1
>> - Kernel taints: kernel.is_tainted.taint_warn: true
>> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true
>>
>> This change imposes little execution overhead and adds just a few
>> lines of logging. The facts will be printed on normal igt-runner
>> output. Here is a real example from our CI shwoing
>> hotreplug-lateclose changing the DRM card number
>
> Since you give that as an example of how helpful your facts can be, and follow
> that with a kernel taint example, that may indicate you think, and users of
> your facts may then be mislead having that read, that the taint was related to
> the change of card number, while both had nothing to do with each other.
Let’s take a step back to define the purpose and scope of igt-facts:
- Definition of a fact from the dictionary: A fact is an objectively verifiable
piece of information.
- Purpose of igt-facts: Track which tests cause changes to the facts.
The operation is straightforward: facts are collected before and after each test,
and any differences are logged. Here’s an example showing a fact change and a new
fact after running hotreplug-lateclose:
[249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1
[249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true
This output highlights the facts without implying causation between them. The
tool(and my commit message) neither explains relationships between facts nor
misleads users into assuming causation.
>
> Please add something like ', which is expected,' to your description. Changed
> card number is expected, and that's nothing wrong with that. The old, still
> open instance of the driver still exists. It is expected to be already
> decoupled from hardware, but it still occupies its minor device number. Then,
> new instance of the driver, attached to the same hardware, gets first free
> minor number. Refresh of IGT device filter before health check can perfectly
> handle that case.
The igt-facts tool reports changes without qualifying them as expected or
unexpected. For example, the change in the DRM card number is logged as a fact,
irrespective of its expected nature.
>
>> and tainting the
>> kernel on the abort path:
>
> No, hotreplug-lateclose doesn't taint the kernel. That's wrong conclusion from
> what was reported, or wrong wording at least. Kernel taint (or lockdep not
> active) must have been a result of driver issues (or maybe well known
> limitations) exposed by the test. Then igt_runner took the abort path since
> it detected those unhealthy conditions. Again, users of your facts may be
> mislead if your messages can really suggest what you tell us about what you
> think has happened.
The kernel taint reported is a fact observed after the test. While its root cause
lies within the driver or other system components, this is outside the scope of
igt-facts. The report simply reflects what changed during the test.
To summarize:
- igt-facts is a factual reporting tool. It does not establish causation or
interpret changes.
- Both the DRM card number change and kernel taint were factual observations post
hotreplug-lateclose.
I hope this clarifies the intent and operation of igt-facts. Please let me know if
further discussion is needed.
Thank you,
Peter
>
> The whole idea of testing i915 resistance to hot unplug scenarios came from
> perceived cases of VFs potentially disappearing from VMs. Some significant
> effort was put on that feature of the driver, but since the 'hot' test
> variants were never unblocked in CI, things tended to get worse and worse over
> time while new driver features that didn't care for hot unplug capability were
> added.
>
> Thanks,
> Janusz
>
>>
>> [245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose)
>> [245.383596] Starting subtest: hotreplug-lateclose
>> [249.843361] Aborting: Lockdep not active
>> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1
>> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true
>> [249.859075] Closing watchdogs
>>
>> CC: Ryszard Knop <ryszard.knop@intel.com>
>> CC: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
>> CC: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
>> CC: Lucas De Marchi <lucas.demarchi@intel.com>
>> CC: luciano.coelho@intel.com
>> CC: nirmoy.das@intel.com
>> CC: stuart.summers@intel.com
>> CC: himal.prasad.ghimiray@intel.com
>> CC: dominik.karol.piatkowski@intel.com
>> CC: katarzyna.piecielska@intel.com
>> Signed-off-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
>> ---
>> v10:
>> - fix memory leaks from asprintf (Thank you Dominik Karol!)
>> - fix comments for consistency (Thank you Dominik Karol!)
>>
>> v9:
>> - do not report new hardware when loading/unloading kmod changes
>> the string of the GPU name. I accidentally reintroduced this
>> issue when refactoring to use linked lists.
>> - add tools/lsfacts: 9 lines of code that print either the facts
>> or that no facts were found.
>> - fix code comments describing functions
>> - fix white space issues
>>
>> v8:
>> - fix white space issues
>>
>> v7:
>> - refactor to use linked lists provided by igt_lists
>> - Added function arguments to code comments
>> - updated commit message
>>
>> v6:
>> - sort includes in igt_facts.c alphabetically
>> - add facts for kernel taints using igt_kernel_tainted() and
>> igt_explain_taints()
>>
>> v5:
>> - fix the broken patch format from v4
>>
>> v4:
>> - fix a bug on delete_fact()
>> - drop glib and calls to g_ functions
>> - change commit message to indicate that report only on fact changes
>> - use consistent format for reporting changes
>> - fix SPDX header format
>>
>> v3:
>> - refreshed commit message
>> - changed format SPDX string
>> - removed license text
>> - replace last_test assignment when null by two ternary operators
>> - added function descriptions following example found elsewhere in
>> the code
>> - added igt_assert to catch failures to realloc()
>>
>> v2:
>> - add lib/tests/igt_facts.c for basic unit testing
>> - bugfix: do not report a new gpu when the driver changes the gpu name
>> - bugfix: do not report the pci_id twice on the gpu name
>>
>> lib/igt_facts.c | 755 ++++++++++++++++++++++++++++++++++++++++++
>> lib/igt_facts.h | 47 +++
>> lib/meson.build | 1 +
>> lib/tests/igt_facts.c | 15 +
>> lib/tests/meson.build | 1 +
>> runner/executor.c | 10 +
>> tools/lsfacts.c | 25 ++
>> tools/meson.build | 1 +
>> 8 files changed, 855 insertions(+)
>> create mode 100644 lib/igt_facts.c
>> create mode 100644 lib/igt_facts.h
>> create mode 100644 lib/tests/igt_facts.c
>> create mode 100644 tools/lsfacts.c
>>
>> diff --git a/lib/igt_facts.c b/lib/igt_facts.c
>> new file mode 100644
>> index 000000000..4749d3417
>> --- /dev/null
>> +++ b/lib/igt_facts.c
>> @@ -0,0 +1,755 @@
>> +// SPDX-License-Identifier: MIT
>> +// Copyright © 2024 Intel Corporation
>> +
>> +#include <ctype.h>
>> +#include <libudev.h>
>> +#include <stdio.h>
>> +#include <sys/time.h>
>> +#include <time.h>
>> +
>> +#include "igt_core.h"
>> +#include "igt_device_scan.h"
>> +#include "igt_facts.h"
>> +#include "igt_kmod.h"
>> +#include "igt_list.h"
>> +#include "igt_taints.h"
>> +
>> +static struct igt_list_head igt_facts_list_drm_card_head;
>> +static struct igt_list_head igt_facts_list_kmod_head;
>> +static struct igt_list_head igt_facts_list_ktaint_head;
>> +static struct igt_list_head igt_facts_list_pci_gpu_head;
>> +
>> +
>> +/**
>> + * igt_facts_lists_init:
>> + *
>> + * Initialize igt_facts linked lists.
>> + *
>> + * Returns: void
>> + */
>> +void igt_facts_lists_init(void)
>> +{
>> + IGT_INIT_LIST_HEAD(&igt_facts_list_drm_card_head);
>> + IGT_INIT_LIST_HEAD(&igt_facts_list_kmod_head);
>> + IGT_INIT_LIST_HEAD(&igt_facts_list_ktaint_head);
>> + IGT_INIT_LIST_HEAD(&igt_facts_list_pci_gpu_head);
>> +}
>> +
>> +
>> +/**
>> + * igt_facts_log:
>> + * @last_test: name of the test that triggered the fact
>> + * @name: name of the fact
>> + * @new_value: new value of the fact
>> + * @old_value: old value of the fact
>> + *
>> + * Reports fact changes:
>> + * - new fact: if old_value is NULL and new_value is not NULL
>> + * - deleted fact: if new_value is NULL and old_value is not NULL
>> + * - changed fact: if new_value is different from old_value
>> + *
>> + * Returns: void
>> + */
>> +static void igt_facts_log(const char *last_test, const char *name,
>> + const char *new_value, const char *old_value)
>> +{
>> + struct timespec uptime_ts;
>> + char *uptime = NULL;
>> + const char *before_tests = "before any test";
>> +
>> + if (old_value == NULL && new_value == NULL)
>> + return;
>> +
>> + if (clock_gettime(CLOCK_BOOTTIME, &uptime_ts) != 0)
>> + return;
>> +
>> + asprintf(&uptime,
>> + "%ld.%06ld",
>> + uptime_ts.tv_sec,
>> + uptime_ts.tv_nsec / 1000);
>> +
>> + /* New fact */
>> + if (old_value == NULL && new_value != NULL) {
>> + igt_info("[%s] [FACT %s] new: %s: %s\n",
>> + uptime,
>> + last_test ? last_test : before_tests,
>> + name,
>> + new_value);
>> + goto out;
>> + }
>> +
>> + /* Update fact */
>> + if (old_value != NULL && new_value != NULL) {
>> + igt_info("[%s] [FACT %s] changed: %s: %s -> %s\n",
>> + uptime,
>> + last_test ? last_test : before_tests,
>> + name,
>> + old_value,
>> + new_value);
>> + goto out;
>> + }
>> +
>> + /* Deleted fact */
>> + if (old_value != NULL && new_value == NULL) {
>> + igt_info("[%s] [FACT %s] deleted: %s: %s\n",
>> + uptime,
>> + last_test ? last_test : before_tests,
>> + name,
>> + old_value);
>> + goto out;
>> + }
>> +
>> +out:
>> + free(uptime);
>> +}
>> +
>> +/**
>> + * igt_facts_list_get:
>> + * @name: name of the fact to be added
>> + * @head: head of the list
>> + *
>> + * Get a fact from the list.
>> + *
>> + * Returns: pointer to the fact if found, NULL otherwise
>> + *
>> + */
>> +static igt_fact *igt_facts_list_get(const char *name,
>> + struct igt_list_head *head)
>> +{
>> + igt_fact *fact = NULL;
>> +
>> + if (igt_list_empty(head))
>> + return NULL;
>> +
>> + igt_list_for_each_entry(fact, head, link) {
>> + if (strcmp(fact->name, name) == 0)
>> + return fact;
>> + }
>> + return NULL;
>> +}
>> +
>> +/**
>> + * igt_facts_list_del:
>> + * @name: name of the fact to be added
>> + * @head: head of the list
>> + * @last_test: name of the last test
>> + * @log: bool indicating if the delete operation should be logged
>> + *
>> + * Delete a fact from the list.
>> + *
>> + * Returns: bool indicating if fact was deleted from the list
>> + *
>> + */
>> +static bool igt_facts_list_del(const char *name,
>> + struct igt_list_head *head,
>> + const char *last_test,
>> + bool log)
>> +{
>> + igt_fact *fact = NULL;
>> +
>> + if (igt_list_empty(head))
>> + return false;
>> +
>> + igt_list_for_each_entry(fact, head, link) {
>> + if (strcmp(fact->name, name) == 0) {
>> + if (log)
>> + igt_facts_log(last_test, fact->name,
>> + NULL, fact->value);
>> +
>> + igt_list_del(&fact->link);
>> + free(fact->name);
>> + free(fact->value);
>> + free(fact->last_test);
>> + free(fact);
>> + return true;
>> + }
>> + }
>> + return false;
>> +}
>> +
>> +/**
>> + * igt_facts_list_add:
>> + * @name: name of the fact to be added
>> + * @value: value of the fact to be added
>> + * @last_test: name of the last test
>> + * @head: head of the list
>> + *
>> + * Returns: bool indicating if fact was added to the list
>> + *
>> + */
>> +static bool igt_facts_list_add(const char *name,
>> + const char *value,
>> + const char *last_test,
>> + struct igt_list_head *head)
>> +{
>> + igt_fact *new_fact = NULL, *old_fact = NULL;
>> + bool logged = false;
>> +
>> + if (name == NULL || value == NULL)
>> + return false;
>> +
>> + old_fact = igt_facts_list_get(name, head);
>> + if (old_fact) {
>> + if (strcmp(old_fact->value, value) == 0) {
>> + old_fact->present = true;
>> + return false;
>> + }
>> + igt_facts_log(last_test, name, value, old_fact->value);
>> + logged = true;
>> + igt_facts_list_del(name, head, last_test, false);
>> + }
>> +
>> + new_fact = malloc(sizeof(igt_fact));
>> + if (new_fact == NULL)
>> + return false;
>> +
>> + new_fact->name = strdup(name);
>> + new_fact->value = strdup(value);
>> + new_fact->last_test = last_test ? strdup(last_test) : NULL;
>> + new_fact->present = true;
>> +
>> + if (!logged)
>> + igt_facts_log(last_test, name, value, NULL);
>> +
>> + igt_list_add(&new_fact->link, head);
>> +
>> + return true;
>> +}
>> +
>> +/**
>> + * igt_facts_list_mark:
>> + * @head: head of the list
>> + *
>> + * Mark all facts in the list as not present. Opted for the mark and sweep
>> + * design pattern due to its simplicity and efficiency.
>> + *
>> + * Returns: void
>> + */
>> +static void igt_facts_list_mark(struct igt_list_head *head)
>> +{
>> + igt_fact *fact = NULL;
>> +
>> + if (igt_list_empty(head))
>> + return;
>> +
>> + igt_list_for_each_entry(fact, head, link)
>> + fact->present = false;
>> +}
>> +
>> +/**
>> + * igt_facts_list_sweep:
>> + * @head: head of the list
>> + * @last_test: name of the last test
>> + *
>> + * Sweep the list and delete all facts that are not present. Opted for the mark
>> + * and sweep design pattern due to its simplicity and efficiency.
>> + *
>> + * Returns: void
>> + */
>> +static void igt_facts_list_sweep(struct igt_list_head *head,
>> + const char *last_test)
>> +{
>> + igt_fact *fact = NULL, *tmp = NULL;
>> +
>> + if (igt_list_empty(head))
>> + return;
>> +
>> + igt_list_for_each_entry_safe(fact, tmp, head, link)
>> + if (!fact->present)
>> + igt_facts_list_del(fact->name, head, last_test, true);
>> +}
>> +
>> +/**
>> + * igt_facts_list_mark_and_sweep:
>> + * @head: head of the list
>> + *
>> + * Clean up the list using mark and sweep. Opted for the mark and sweep
>> + * design pattern due to its simplicity and efficiency.
>> + *
>> + * Returns: void
>> + */
>> +static void igt_facts_list_mark_and_sweep(struct igt_list_head *head)
>> +{
>> + igt_facts_list_mark(head);
>> + igt_facts_list_sweep(head, NULL);
>> +}
>> +
>> +/**
>> + * igt_facts_are_all_lists_empty:
>> + *
>> + * Returns true if all lists are empty. Used by the tool lsfacts.
>> + *
>> + * Returns: bool
>> + */
>> +bool igt_facts_are_all_lists_empty(void)
>> +{
>> + return igt_list_empty(&igt_facts_list_drm_card_head) &&
>> + igt_list_empty(&igt_facts_list_kmod_head) &&
>> + igt_list_empty(&igt_facts_list_ktaint_head) &&
>> + igt_list_empty(&igt_facts_list_pci_gpu_head);
>> +}
>> +
>> +/**
>> + * igt_facts_scan_pci_gpus:
>> + * @last_test: name of the last test
>> + *
>> + * This function scans the pci bus for gpus using udev. It uses
>> + * igt_facts_list_mark(), igt_facts_list_add() and igt_facts_list_sweep() to
>> + * update igt_facts_list_pci_gpu_head.
>> + *
>> + * Returns: void
>> + */
>> +static void igt_facts_scan_pci_gpus(const char *last_test)
>> +{
>> + static struct igt_list_head *head = &igt_facts_list_pci_gpu_head;
>> + struct udev *udev = NULL;
>> + struct udev_enumerate *enumerate = NULL;
>> + struct udev_list_entry *devices, *dev_list_entry;
>> + struct igt_device_card card;
>> + char pcistr[10];
>> + int ret;
>> + char *factname = NULL;
>> + char *factvalue = NULL;
>> +
>> + udev = udev_new();
>> + if (!udev) {
>> + igt_warn("Failed to create udev context\n");
>> + return;
>> + }
>> +
>> + enumerate = udev_enumerate_new(udev);
>> + if (!enumerate) {
>> + igt_warn("Failed to create udev enumerate\n");
>> + udev_unref(udev);
>> + return;
>> + }
>> +
>> + ret = udev_enumerate_add_match_subsystem(enumerate, "pci");
>> + if (ret < 0)
>> + goto out;
>> +
>> + ret = udev_enumerate_add_match_property(enumerate,
>> + "PCI_CLASS",
>> + "30000");
>> + if (ret < 0)
>> + goto out;
>> +
>> + ret = udev_enumerate_add_match_property(enumerate,
>> + "PCI_CLASS",
>> + "38000");
>> + if (ret < 0)
>> + goto out;
>> +
>> + ret = udev_enumerate_scan_devices(enumerate);
>> + if (ret < 0)
>> + goto out;
>> +
>> + devices = udev_enumerate_get_list_entry(enumerate);
>> + if (!devices)
>> + goto out;
>> +
>> + igt_facts_list_mark(head);
>> +
>> + udev_list_entry_foreach(dev_list_entry, devices) {
>> + const char *path;
>> + struct udev_device *udev_dev;
>> + struct udev_list_entry *entry;
>> + char *model = NULL;
>> + char *codename = NULL;
>> + igt_fact *old_fact = NULL;
>> +
>> + path = udev_list_entry_get_name(dev_list_entry);
>> + udev_dev = udev_device_new_from_syspath(udev, path);
>> + if (!udev_dev)
>> + continue;
>> +
>> + /* Strip path to only the content after the last / */
>> + path = strrchr(path, '/');
>> + if (path)
>> + path++;
>> + else
>> + path = "unknown";
>> +
>> + strcpy(card.pci_slot_name, "-");
>> +
>> + entry = udev_device_get_properties_list_entry(udev_dev);
>> + while (entry) {
>> + const char *name = udev_list_entry_get_name(entry);
>> + const char *value = udev_list_entry_get_value(entry);
>> +
>> + entry = udev_list_entry_get_next(entry);
>> + if (!strcmp(name, "ID_MODEL_FROM_DATABASE"))
>> + model = strdup(value);
>> + else if (!strcmp(name, "PCI_ID"))
>> + igt_assert_eq(sscanf(value, "%hx:%hx",
>> + &card.pci_vendor,
>> + &card.pci_device), 2);
>> + }
>> + snprintf(pcistr, sizeof(pcistr), "%04x:%04x",
>> + card.pci_vendor, card.pci_device);
>> + codename = igt_device_get_pretty_name(&card, false);
>> +
>> + /* Set codename to null if it is the same string as pci_id */
>> + if (codename && strcmp(pcistr, codename) == 0) {
>> + free(codename);
>> + codename = NULL;
>> + }
>> + asprintf(&factname, "%s.%s", pci_gpu_fact, path);
>> + asprintf(&factvalue,
>> + "%s %s %s",
>> + pcistr,
>> + codename ? codename : "",
>> + model ? model : "");
>> +
>> + /**
>> + * Loading and unloading the kmod may change the human
>> + * readeable string in value. Do not change value if the
>> + * pci id is the same.
>> + */
>> + old_fact = igt_facts_list_get(factname, head);
>> + if (old_fact && strncmp(old_fact->value, factvalue, 9) == 0)
>> + old_fact->present = true;
>> + else
>> + igt_facts_list_add(factname, factvalue, last_test, head);
>> +
>> + free(codename);
>> + free(model);
>> + free(factname);
>> + free(factvalue);
>> + udev_device_unref(udev_dev);
>> + }
>> +
>> + igt_facts_list_sweep(head, last_test);
>> +
>> +out:
>> + udev_enumerate_unref(enumerate);
>> + udev_unref(udev);
>> +}
>> +
>> +/**
>> + * igt_facts_scan_pci_drm_cards:
>> + * @last_test: name of the last test
>> + *
>> + * This function scans the pci bus for drm cards using udev. It uses the
>> + * igt_facts_list_mark(), igt_facts_list_add() and igt_facts_list_sweep() to
>> + * update igt_facts_list_drm_card_head.
>> + *
>> + * Returns: void
>> + */
>> +static void igt_facts_scan_pci_drm_cards(const char *last_test)
>> +{
>> + static struct igt_list_head *head = &igt_facts_list_drm_card_head;
>> + struct udev *udev = NULL;
>> + struct udev_enumerate *enumerate = NULL;
>> + struct udev_list_entry *devices, *dev_list_entry;
>> + int ret;
>> + char *factname = NULL;
>> + char *factvalue = NULL;
>> +
>> + udev = udev_new();
>> + if (!udev)
>> + return;
>> +
>> + enumerate = udev_enumerate_new(udev);
>> + if (!enumerate) {
>> + udev_unref(udev);
>> + return;
>> + }
>> +
>> + ret = udev_enumerate_add_match_subsystem(enumerate, "drm");
>> + if (ret < 0)
>> + goto out;
>> +
>> + ret = udev_enumerate_scan_devices(enumerate);
>> + if (ret < 0)
>> + goto out;
>> +
>> + devices = udev_enumerate_get_list_entry(enumerate);
>> + if (!devices)
>> + goto out;
>> +
>> + ret = udev_enumerate_add_match_subsystem(enumerate, "drm");
>> + if (ret < 0)
>> + goto out;
>> +
>> + ret = udev_enumerate_scan_devices(enumerate);
>> + if (ret < 0)
>> + goto out;
>> +
>> + devices = udev_enumerate_get_list_entry(enumerate);
>> + if (!devices)
>> + goto out;
>> +
>> + igt_facts_list_mark(head);
>> +
>> + udev_list_entry_foreach(dev_list_entry, devices) {
>> + const char *path;
>> + struct udev_device *drm_dev, *pci_dev;
>> + const char *drm_name, *pci_addr;
>> +
>> + path = udev_list_entry_get_name(dev_list_entry);
>> + drm_dev = udev_device_new_from_syspath(udev, path);
>> + if (!drm_dev)
>> + continue;
>> +
>> + drm_name = udev_device_get_sysname(drm_dev);
>> + /* Filter the device by name. Want devices such as card0 and card1.
>> + * If the device has '-' in the name, contine
>> + */
>> + if (strncmp(drm_name, "card", 4) != 0 ||
>> + strchr(drm_name, '-') != NULL) {
>> + udev_device_unref(drm_dev);
>> + continue;
>> + }
>> +
>> + /* Get the pci address of the gpu associated with the drm_dev*/
>> + pci_dev = udev_device_get_parent_with_subsystem_devtype(drm_dev,
>> + "pci",
>> + NULL);
>> + if (pci_dev) {
>> + pci_addr = udev_device_get_sysattr_value(pci_dev,
>> + "address");
>> + if (!pci_addr)
>> + pci_addr = udev_device_get_sysname(pci_dev);
>> + } else {
>> + /* Some GPUs are platform devices. Ignore them. */
>> + pci_addr = NULL;
>> + udev_device_unref(drm_dev);
>> + continue;
>> + }
>> +
>> + asprintf(&factname, "%s.%s", drm_card_fact, pci_addr);
>> + asprintf(&factvalue, "%s", drm_name);
>> +
>> + igt_facts_list_add(factname, factvalue, last_test, head);
>> +
>> + free(factname);
>> + free(factvalue);
>> + udev_device_unref(drm_dev);
>> + }
>> +
>> + igt_facts_list_sweep(head, last_test);
>> +
>> +out:
>> + udev_enumerate_unref(enumerate);
>> + udev_unref(udev);
>> +}
>> +
>> +/**
>> + * igt_facts_scan_kernel_taints:
>> + * @last_test: name of the last test
>> + *
>> + * This function scans for kernel taints using igt_kernel_tainted() and
>> + * igt_explain_taints(). It will cut off the explanation keeping only the
>> + * taint name.
>> + *
>> + * Returns: void
>> + */
>> +static void igt_facts_scan_kernel_taints(const char *last_test)
>> +{
>> + static struct igt_list_head *head = &igt_facts_list_ktaint_head;
>> + unsigned long taints = 0;
>> + const char *reason = NULL;
>> + char *taint_name = NULL;
>> + char *fact_name = NULL;
>> +
>> + taints = igt_kernel_tainted(&taints);
>> + /* For testing, set all bits to 1
>> + * taints = 0xFFFFFFFF;
>> + */
>> +
>> +
>> + igt_facts_list_mark(head);
>> +
>> + while ((reason = igt_explain_taints(&taints)) != NULL) {
>> + /* Cut at the ':' to get only the taint name */
>> + taint_name = strtok(strdup(reason), ":");
>> + if (!taint_name)
>> + continue;
>> +
>> + /* Lowercase taint_name */
>> + for (int i = 0; taint_name[i]; i++)
>> + taint_name[i] = tolower(taint_name[i]);
>> +
>> + asprintf(&fact_name, "%s.%s", ktaint_fact, taint_name);
>> + igt_facts_list_add(fact_name, "true", last_test, head);
>> +
>> + free(taint_name);
>> + free(fact_name);
>> + }
>> +
>> + igt_facts_list_sweep(head, last_test);
>> +}
>> +
>> +
>> +/**
>> + * igt_facts_scan_kernel_loaded_kmods:
>> + * @last_test: name of the last test
>> + *
>> + * This function scans for loaded kmods using igt_fact_kmod_list and
>> + * igt_kmod_is_loaded().
>> + *
>> + * Returns: void
>> + */
>> +static void igt_facts_scan_kernel_loaded_kmods(const char *last_test)
>> +{
>> + static struct igt_list_head *head = &igt_facts_list_kmod_head;
>> + char *name = NULL;
>> +
>> + igt_facts_list_mark(head);
>> +
>> + /* Iterate over igt_fact_kmod_list[] until the element contains "\0" */
>> + for (int i = 0; strcmp(igt_fact_kmod_list[i], "\0") != 0; i++) {
>> + asprintf(&name, "%s.%s", kmod_fact, igt_fact_kmod_list[i]);
>> + if (igt_kmod_is_loaded(igt_fact_kmod_list[i]))
>> + igt_facts_list_add(name, "true", last_test, head);
>> +
>> + free(name);
>> + }
>> +
>> + igt_facts_list_sweep(head, last_test);
>> +}
>> +
>> +/**
>> + * igt_facts:
>> + * @last_test: name of the last test
>> + *
>> + * Call this function where you want to gather and report facts.
>> + *
>> + * Returns: void
>> + */
>> +void igt_facts(const char *last_test)
>> +{
>> + igt_facts_scan_pci_gpus(last_test);
>> + igt_facts_scan_pci_drm_cards(last_test);
>> + igt_facts_scan_kernel_taints(last_test);
>> + igt_facts_scan_kernel_loaded_kmods(last_test);
>> +
>> + fflush(stdout);
>> + fflush(stderr);
>> +}
>> +
>> +/*
>> + * Testing
>> + *
>> + * Defined here to keep most of the functions static
>> + *
>> + */
>> +
>> +/**
>> + * igt_facts_test_add_get:
>> + * @head: head of the list
>> + *
>> + * Tests igt_facts_list_add and igt_facts_list_get.
>> + *
>> + * Returns: void
>> + */
>> +static void igt_facts_test_add_get(struct igt_list_head *head)
>> +{
>> + igt_fact *fact = NULL;
>> + bool ret;
>> + const char *name = "hardware.pci.gpu_at_addr.0000:00:02.0";
>> + const char *value = "8086:64a0 Intel Lunarlake (Gen20)";
>> + const char *last_test = NULL;
>> +
>> + ret = igt_facts_list_add(name, value, last_test, head);
>> + igt_assert(ret == true);
>> +
>> + // Assert that there is one element in the linked list
>> + igt_assert_eq(igt_list_length(head), 1);
>> +
>> + // Assert that the element in the linked list is the one we added
>> + fact = igt_facts_list_get(name, head);
>> + igt_assert(fact != NULL);
>> + igt_assert_eq(strcmp(fact->name, name), 0);
>> + igt_assert_eq(strcmp(fact->value, value), 0);
>> + igt_assert(fact->present == true);
>> + igt_assert(fact->last_test == NULL);
>> +}
>> +
>> +/**
>> + * igt_facts_test_mark_and_sweep:
>> + * @head: head of the list
>> + *
>> + * - Add 3 elements to the list and mark them as not present.
>> + * - Update two of the elements and mark them as present.
>> + * - Sweep the list and assert that
>> + * - Only the two updated elements are present
>> + * - The third element was deleted
>> + *
>> + * Returns: void
>> + */
>> +static void igt_facts_test_mark_and_sweep(struct igt_list_head *head)
>> +{
>> + igt_fact *fact = NULL;
>> + const char *name1 = "hardware.pci.gpu_at_addr.0000:00:02.0";
>> + const char *value1 = "8086:64a0 Intel Lunarlake (Gen20)";
>> + const char *name2 = "hardware.pci.gpu_at_addr.0000:00:03.0";
>> + const char *value2 = "8086:64a1 Intel Lunarlake (Gen21)";
>> + const char *name3 = "hardware.pci.gpu_at_addr.0000:00:04.0";
>> + const char *value3 = "8086:64a2 Intel Lunarlake (Gen22)";
>> +
>> + igt_facts_list_add(name1, value1, NULL, head);
>> + igt_facts_list_add(name2, value2, NULL, head);
>> + igt_facts_list_add(name3, value3, NULL, head);
>> +
>> + igt_facts_list_mark(head);
>> +
>> + igt_facts_list_add(name1, value1, NULL, head);
>> + igt_facts_list_add(name2, value2, NULL, head);
>> +
>> + igt_facts_list_sweep(head, NULL);
>> +
>> + // Assert that there are two elements in the linked list
>> + igt_assert_eq(igt_list_length(head), 2);
>> +
>> + // Assert that the two updated elements are present
>> + fact = igt_facts_list_get(name1, head);
>> + igt_assert(fact != NULL);
>> + igt_assert(fact->present == true);
>> +
>> + fact = igt_facts_list_get(name2, head);
>> + igt_assert(fact != NULL);
>> + igt_assert(fact->present == true);
>> +
>> + // Assert that the third element was deleted
>> + fact = igt_facts_list_get(name3, head);
>> + igt_assert(fact == NULL);
>> +}
>> +
>> +/**
>> + * igt_facts_test:
>> + *
>> + * Main function for testing the igt_facts module
>> + *
>> + * Returns: bool indicating if the tests passed
>> + */
>> +void igt_facts_test(void)
>> +{
>> + const char *last_test = "Unit Testing";
>> +
>> + igt_facts_lists_init();
>> +
>> + /* Assert that all lists are empty */
>> + igt_assert(igt_list_empty(&igt_facts_list_kmod_head));
>> + igt_assert(igt_list_empty(&igt_facts_list_ktaint_head));
>> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head));
>> + igt_assert(igt_list_empty(&igt_facts_list_drm_card_head));
>> +
>> + /* Assert that add and get work. Will add one element to the list */
>> + igt_facts_test_add_get(&igt_facts_list_pci_gpu_head);
>> +
>> + /* Assert that igt_facts_list_mark_and_sweep() cleans up the list */
>> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == false);
>> + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head);
>> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == true);
>> +
>> + /* Test the mark and sweep pattern used to delete elements
>> + * from the list
>> + */
>> + igt_facts_test_mark_and_sweep(&igt_facts_list_pci_gpu_head);
>> +
>> + /* Clean up the list and call igt_facts(). This should not crash */
>> + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head);
>> + igt_facts(last_test);
>> +}
>> diff --git a/lib/igt_facts.h b/lib/igt_facts.h
>> new file mode 100644
>> index 000000000..e4adca3fb
>> --- /dev/null
>> +++ b/lib/igt_facts.h
>> @@ -0,0 +1,47 @@
>> +/* SPDX-License-Identifier: MIT
>> + * Copyright © 2024 Intel Corporation
>> + */
>> +
>> +#include <stdbool.h>
>> +
>> +#include "igt_list.h"
>> +
>> +
>> +/* igt_fact:
>> + * @name: name of the fact
>> + * @value: value of the fact
>> + * @last_test: name of the test that triggered the fact
>> + * @present: bool indicating if fact is present. Used for deleting facts from
>> + * the list.
>> + * @link: link to the next fact
>> + *
>> + * A fact is a piece of information that can be used to determine the state of
>> + * the system.
>> + *
>> + */
>> +typedef struct {
>> + char *name;
>> + char *value;
>> + char *last_test;
>> + bool present; /* For mark and seep */
>> + struct igt_list_head link;
>> +} igt_fact;
>> +
>> +const char *igt_fact_kmod_list[] = {
>> + "amdgpu",
>> + "i915",
>> + "nouveau",
>> + "radeon",
>> + "xe",
>> + "\0"
>> +};
>> +
>> +const char *kmod_fact = "kernel.kmod_is_loaded"; /* true or false */
>> +const char *ktaint_fact = "kernel.is_tainted"; /* taint name: taint_warn */
>> +const char *pci_gpu_fact = "hardware.pci.gpu_at_addr"; /* id vendor model */
>> +const char *drm_card_fact = "hardware.pci.drm_card_at_addr"; /* cardX */
>> +
>> +void igt_facts_lists_init(void);
>> +void igt_facts(const char *last_test);
>> +bool igt_facts_are_all_lists_empty(void);
>> +void igt_facts_test(void); /* For unit testing only */
>> diff --git a/lib/meson.build b/lib/meson.build
>> index c3556a921..c44ca2b5a 100644
>> --- a/lib/meson.build
>> +++ b/lib/meson.build
>> @@ -18,6 +18,7 @@ lib_sources = [
>> 'i915/i915_crc.c',
>> 'igt_collection.c',
>> 'igt_color_encoding.c',
>> + 'igt_facts.c',
>> 'igt_crc.c',
>> 'igt_debugfs.c',
>> 'igt_device.c',
>> diff --git a/lib/tests/igt_facts.c b/lib/tests/igt_facts.c
>> new file mode 100644
>> index 000000000..7fa9d0f22
>> --- /dev/null
>> +++ b/lib/tests/igt_facts.c
>> @@ -0,0 +1,15 @@
>> +// SPDX-License-Identifier: MIT
>> +// Copyright © 2024 Intel Corporation
>> +
>> +#include <stdbool.h>
>> +
>> +#include "igt_core.h"
>> +#include "igt_facts.h"
>> +
>> +/* Tests are not defined here so we can keep most of the functions static */
>> +
>> +igt_simple_main
>> +{
>> + igt_info("Running igt_facts_test\n");
>> + igt_facts_test();
>> +}
>> diff --git a/lib/tests/meson.build b/lib/tests/meson.build
>> index df8092638..1ce19f63c 100644
>> --- a/lib/tests/meson.build
>> +++ b/lib/tests/meson.build
>> @@ -8,6 +8,7 @@ lib_tests = [
>> 'igt_dynamic_subtests',
>> 'igt_edid',
>> 'igt_exit_handler',
>> + 'igt_facts',
>> 'igt_fork',
>> 'igt_fork_helper',
>> 'igt_hook',
>> diff --git a/runner/executor.c b/runner/executor.c
>> index ac73e1dde..d1eca3c05 100644
>> --- a/runner/executor.c
>> +++ b/runner/executor.c
>> @@ -30,6 +30,7 @@
>>
>> #include "igt_aux.h"
>> #include "igt_core.h"
>> +#include "igt_facts.h"
>> #include "igt_taints.h"
>> #include "igt_vec.h"
>> #include "executor.h"
>> @@ -2306,6 +2307,9 @@ bool execute(struct execute_state *state,
>> sigset_t sigmask;
>> double time_spent = 0.0;
>> bool status = true;
>> + char *last_test = NULL;
>> +
>> + igt_facts_lists_init();
>>
>> if (state->dry) {
>> outf("Dry run, not executing. Invoke igt_resume if you want to execute.\n");
>> @@ -2438,6 +2442,10 @@ bool execute(struct execute_state *state,
>> int result;
>> bool already_written = false;
>>
>> + /* Calls before running each test */
>> + igt_facts(last_test);
>> + last_test = entry_display_name(&job_list->entries[state->next]);
>> +
>> if (should_die_because_signal(sigfd)) {
>> status = false;
>> goto end;
>> @@ -2526,6 +2534,8 @@ bool execute(struct execute_state *state,
>> return execute(state, settings, job_list);
>> }
>> }
>> + /* Last call to collect facts after the last test runs */
>> + igt_facts(last_test);
>>
>> if ((timefd = openat(resdirfd, "endtime.txt", O_CREAT | O_WRONLY | O_EXCL, 0666)) >= 0) {
>> dprintf(timefd, "%f\n", timeofday_double());
>> diff --git a/tools/lsfacts.c b/tools/lsfacts.c
>> new file mode 100644
>> index 000000000..10dee0317
>> --- /dev/null
>> +++ b/tools/lsfacts.c
>> @@ -0,0 +1,25 @@
>> +// SPDX-License-Identifier: MIT
>> +// Copyright © 2024 Intel Corporation
>> +
>> +#include "igt.h"
>> +#include "igt_facts.h"
>> +
>> +/**
>> + * SECTION:lsfacts
>> + * @short_description: lsfacts
>> + * @title: lsfacts
>> + * @include: lsfacts.c
>> + *
>> + * # lsfacts
>> + *
>> + * Scan for igt-facts and print them on screen. Indicate if no facts are found.
>> + */
>> +int main(int argc, char *argv[])
>> +{
>> + igt_facts_lists_init();
>> +
>> + igt_facts("lsfacts");
>> +
>> + if (igt_facts_are_all_lists_empty())
>> + igt_info("No facts found...\n");
>> +}
>> diff --git a/tools/meson.build b/tools/meson.build
>> index 48c9a4b50..ff1b0ef90 100644
>> --- a/tools/meson.build
>> +++ b/tools/meson.build
>> @@ -42,6 +42,7 @@ tools_progs = [
>> 'intel_gem_info',
>> 'intel_gvtg_test',
>> 'dpcd_reg',
>> + 'lsfacts',
>> 'lsgpu',
>> 'power',
>> ]
>>
>
>
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking
2024-12-05 4:54 ` [PATCH i-g-t v10] " Peter Senna Tschudin
2024-12-05 9:08 ` Piatkowski, Dominik Karol
@ 2024-12-06 11:42 ` Kamil Konieczny
2024-12-06 13:16 ` Peter Senna Tschudin
1 sibling, 1 reply; 13+ messages in thread
From: Kamil Konieczny @ 2024-12-06 11:42 UTC (permalink / raw)
To: Peter Senna Tschudin
Cc: igt-dev@lists.freedesktop.org, Ryszard Knop, Janusz Krzysztofik,
Zbigniew Kempczyński, Lucas De Marchi, luciano.coelho,
nirmoy.das, stuart.summers, himal.prasad.ghimiray,
dominik.karol.piatkowski, katarzyna.piecielska, Petri Latvala,
Juha-Pekka Heikkila, Juha-Pekka Heikkila, Swati Sharma,
Jani Saarinen, Jani Nikula, Rob Clark, Rob Clark, Helen Koike,
Melissa Wen, Maíra Canal
Hi Peter,
On 2024-12-05 at 05:54:26 +0100, Peter Senna Tschudin wrote:
overall looks good, I have few nits, see below.
As for subject, it usally informs what changed, so "igt-runner fact checking"
is a little misleading. Please see my ask about that below.
> When using igt-runner, collect facts before each test and after the
> last test, and report when facts change. The facts are:
> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20)
> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1
> - Kernel taints: kernel.is_tainted.taint_warn: true
> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true
>
> This change imposes little execution overhead and adds just a few
imho 'little overhead' is beause facts are only four but I agree
we could change implementation when its number grows.
> lines of logging. The facts will be printed on normal igt-runner
> output. Here is a real example from our CI showing
> hotreplug-lateclose changing the DRM card number and tainting the
> kernel on the abort path:
>
> [245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose)
> [245.383596] Starting subtest: hotreplug-lateclose
> [249.843361] Aborting: Lockdep not active
> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1
> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true
> [249.859075] Closing watchdogs
>
> CC: Ryszard Knop <ryszard.knop@intel.com>
> CC: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> CC: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> CC: Lucas De Marchi <lucas.demarchi@intel.com>
> CC: luciano.coelho@intel.com
> CC: nirmoy.das@intel.com
> CC: stuart.summers@intel.com
> CC: himal.prasad.ghimiray@intel.com
> CC: dominik.karol.piatkowski@intel.com
> CC: katarzyna.piecielska@intel.com
+cc Petri + kms devs
Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Cc: Juha-Pekka Heikkila <juha-pekka.heikkila@intel.com>
Cc: Swati Sharma <swati2.sharma@intel.com>
Cc: Jani Saarinen <jani.saarinen@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
+other gpu devs
Cc: Rob Clark <robdclark@gmail.com>
Cc: Rob Clark <robdclark@chromium.org>
Cc: Helen Koike <helen.koike@collabora.com>
+drm ci dev
Cc: Melissa Wen <mwen@igalia.com>
Cc: Maíra Canal <mcanal@igalia.com>
> Signed-off-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
> ---
> Re-sending as the mailing list server rejected the message
> yesterday due to a probably full disk...
>
> v10:
> - fix memory leaks from asprintf (Thank you Dominik Karol!)
> - fix comments for consistency (Thank you Dominik Karol!)
>
> v9:
> - do not report new hardware when loading/unloading kmod changes
> the string of the GPU name. I accidentally reintroduced this
> issue when refactoring to use linked lists.
> - add tools/lsfacts: 9 lines of code that print either the facts
> or that no facts were found.
> - fix code comments describing functions
> - fix white space issues
>
> v8:
> - fix white space issues
>
> v7:
> - refactor to use linked lists provided by igt_lists
> - Added function arguments to code comments
> - updated commit message
>
> v6:
> - sort includes in igt_facts.c alphabetically
> - add facts for kernel taints using igt_kernel_tainted() and
> igt_explain_taints()
>
> v5:
> - fix the broken patch format from v4
>
> v4:
> - fix a bug on delete_fact()
> - drop glib and calls to g_ functions
> - change commit message to indicate that report only on fact changes
> - use consistent format for reporting changes
> - fix SPDX header format
>
> v3:
> - refreshed commit message
> - changed format SPDX string
> - removed license text
> - replace last_test assignment when null by two ternary operators
> - added function descriptions following example found elsewhere in
> the code
> - added igt_assert to catch failures to realloc()
>
> v2:
> - add lib/tests/igt_facts.c for basic unit testing
> - bugfix: do not report a new gpu when the driver changes the gpu name
> - bugfix: do not report the pci_id twice on the gpu name
>
> lib/igt_facts.c | 755 ++++++++++++++++++++++++++++++++++++++++++
> lib/igt_facts.h | 47 +++
> lib/meson.build | 1 +
> lib/tests/igt_facts.c | 15 +
> lib/tests/meson.build | 1 +
Please split your patchset into three, add above files to
first patch:
[PATCH i-g-t v11 1/3] lib/igt_facts: add fact checking lib
> runner/executor.c | 10 +
[PATCH i-g-t v11 2/3] runner/executor: Check facts at tests run
> tools/lsfacts.c | 25 ++
> tools/meson.build | 1 +
[PATCH i-g-t v11 3/3] tools/lsfacts: Add facts listing tool
and it all with cover letter.
> 8 files changed, 855 insertions(+)
> create mode 100644 lib/igt_facts.c
> create mode 100644 lib/igt_facts.h
> create mode 100644 lib/tests/igt_facts.c
> create mode 100644 tools/lsfacts.c
>
> diff --git a/lib/igt_facts.c b/lib/igt_facts.c
> new file mode 100644
> index 000000000..4749d3417
> --- /dev/null
> +++ b/lib/igt_facts.c
> @@ -0,0 +1,755 @@
> +// SPDX-License-Identifier: MIT
> +// Copyright © 2024 Intel Corporation
Copyright is usally formatted in C-comment style /* comment */
to allow extending it (or adding Author: fields, etc) See other
C files for example tests/device_reset.c
So here:
/*
* Copyright © 2024 Intel Corporation
*/
Do the same in other new files.
Regards,
Kamil
> +
> +#include <ctype.h>
> +#include <libudev.h>
> +#include <stdio.h>
> +#include <sys/time.h>
> +#include <time.h>
> +
> +#include "igt_core.h"
> +#include "igt_device_scan.h"
> +#include "igt_facts.h"
> +#include "igt_kmod.h"
> +#include "igt_list.h"
> +#include "igt_taints.h"
> +
> +static struct igt_list_head igt_facts_list_drm_card_head;
> +static struct igt_list_head igt_facts_list_kmod_head;
> +static struct igt_list_head igt_facts_list_ktaint_head;
> +static struct igt_list_head igt_facts_list_pci_gpu_head;
> +
> +
> +/**
> + * igt_facts_lists_init:
> + *
> + * Initialize igt_facts linked lists.
> + *
> + * Returns: void
> + */
> +void igt_facts_lists_init(void)
> +{
> + IGT_INIT_LIST_HEAD(&igt_facts_list_drm_card_head);
> + IGT_INIT_LIST_HEAD(&igt_facts_list_kmod_head);
> + IGT_INIT_LIST_HEAD(&igt_facts_list_ktaint_head);
> + IGT_INIT_LIST_HEAD(&igt_facts_list_pci_gpu_head);
> +}
> +
> +
> +/**
> + * igt_facts_log:
> + * @last_test: name of the test that triggered the fact
> + * @name: name of the fact
> + * @new_value: new value of the fact
> + * @old_value: old value of the fact
> + *
> + * Reports fact changes:
> + * - new fact: if old_value is NULL and new_value is not NULL
> + * - deleted fact: if new_value is NULL and old_value is not NULL
> + * - changed fact: if new_value is different from old_value
> + *
> + * Returns: void
> + */
> +static void igt_facts_log(const char *last_test, const char *name,
> + const char *new_value, const char *old_value)
> +{
...cut...
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking
2024-12-06 11:42 ` Kamil Konieczny
@ 2024-12-06 13:16 ` Peter Senna Tschudin
2024-12-06 16:46 ` Kamil Konieczny
0 siblings, 1 reply; 13+ messages in thread
From: Peter Senna Tschudin @ 2024-12-06 13:16 UTC (permalink / raw)
To: Kamil Konieczny, igt-dev@lists.freedesktop.org, Ryszard Knop,
Janusz Krzysztofik, Zbigniew Kempczyński, Lucas De Marchi,
luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray,
dominik.karol.piatkowski, katarzyna.piecielska, Petri Latvala,
Juha-Pekka Heikkila, Juha-Pekka Heikkila, Swati Sharma,
Jani Saarinen, Jani Nikula, Rob Clark, Rob Clark, Helen Koike,
Melissa Wen, Maíra Canal
Hi Kamil,
Thank you for your comments. I appreciate the opportunity to clarify and address
your concerns.
On 06.12.2024 12:42, Kamil Konieczny wrote:
> Hi Peter,
> On 2024-12-05 at 05:54:26 +0100, Peter Senna Tschudin wrote:
>
> overall looks good, I have few nits, see below.
Thank you! Our CI tested it thoroughly, it creates value, and it works well.
It is time to merge it.
>
> As for subject, it usally informs what changed, so "igt-runner fact checking"
> is a little misleading. Please see my ask about that below.
Please explain what is misleading about the subject. It looks good to me.
>
>> When using igt-runner, collect facts before each test and after the
>> last test, and report when facts change. The facts are:
>> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20)
>> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1
>> - Kernel taints: kernel.is_tainted.taint_warn: true
>> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true
>>
>> This change imposes little execution overhead and adds just a few
>
> imho 'little overhead' is beause facts are only four but I agree
> we could change implementation when its number grows.
Little overhead is because of how I implemented the code. The execution
is efficient and produces as little text output as possible.
>
>> lines of logging. The facts will be printed on normal igt-runner
>> output. Here is a real example from our CI showing
>> hotreplug-lateclose changing the DRM card number and tainting the
>> kernel on the abort path:
>>
>> [245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose)
>> [245.383596] Starting subtest: hotreplug-lateclose
>> [249.843361] Aborting: Lockdep not active
>> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1
>> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true
>> [249.859075] Closing watchdogs
>>
>> CC: Ryszard Knop <ryszard.knop@intel.com>
>> CC: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
>> CC: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
>> CC: Lucas De Marchi <lucas.demarchi@intel.com>
>> CC: luciano.coelho@intel.com
>> CC: nirmoy.das@intel.com
>> CC: stuart.summers@intel.com
>> CC: himal.prasad.ghimiray@intel.com
>> CC: dominik.karol.piatkowski@intel.com
>> CC: katarzyna.piecielska@intel.com
>
> +cc Petri + kms devs
> Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> Cc: Juha-Pekka Heikkila <juha-pekka.heikkila@intel.com>
> Cc: Swati Sharma <swati2.sharma@intel.com>
> Cc: Jani Saarinen <jani.saarinen@intel.com>
> Cc: Jani Nikula <jani.nikula@linux.intel.com>
>
> +other gpu devs
> Cc: Rob Clark <robdclark@gmail.com>
> Cc: Rob Clark <robdclark@chromium.org>
> Cc: Helen Koike <helen.koike@collabora.com>
>
> +drm ci dev
> Cc: Melissa Wen <mwen@igalia.com>
> Cc: Maíra Canal <mcanal@igalia.com>
That is a lot of people to CC, but ok.
>
>> Signed-off-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
>> ---
>> Re-sending as the mailing list server rejected the message
>> yesterday due to a probably full disk...
>>
>> v10:
>> - fix memory leaks from asprintf (Thank you Dominik Karol!)
>> - fix comments for consistency (Thank you Dominik Karol!)
>>
>> v9:
>> - do not report new hardware when loading/unloading kmod changes
>> the string of the GPU name. I accidentally reintroduced this
>> issue when refactoring to use linked lists.
>> - add tools/lsfacts: 9 lines of code that print either the facts
>> or that no facts were found.
>> - fix code comments describing functions
>> - fix white space issues
>>
>> v8:
>> - fix white space issues
>>
>> v7:
>> - refactor to use linked lists provided by igt_lists
>> - Added function arguments to code comments
>> - updated commit message
>>
>> v6:
>> - sort includes in igt_facts.c alphabetically
>> - add facts for kernel taints using igt_kernel_tainted() and
>> igt_explain_taints()
>>
>> v5:
>> - fix the broken patch format from v4
>>
>> v4:
>> - fix a bug on delete_fact()
>> - drop glib and calls to g_ functions
>> - change commit message to indicate that report only on fact changes
>> - use consistent format for reporting changes
>> - fix SPDX header format
>>
>> v3:
>> - refreshed commit message
>> - changed format SPDX string
>> - removed license text
>> - replace last_test assignment when null by two ternary operators
>> - added function descriptions following example found elsewhere in
>> the code
>> - added igt_assert to catch failures to realloc()
>>
>> v2:
>> - add lib/tests/igt_facts.c for basic unit testing
>> - bugfix: do not report a new gpu when the driver changes the gpu name
>> - bugfix: do not report the pci_id twice on the gpu name
>>
>> lib/igt_facts.c | 755 ++++++++++++++++++++++++++++++++++++++++++
>> lib/igt_facts.h | 47 +++
>> lib/meson.build | 1 +
>> lib/tests/igt_facts.c | 15 +
>> lib/tests/meson.build | 1 +
>
> Please split your patchset into three, add above files to
> first patch:
>
> [PATCH i-g-t v11 1/3] lib/igt_facts: add fact checking lib
>
>> runner/executor.c | 10 +
>
> [PATCH i-g-t v11 2/3] runner/executor: Check facts at tests run
>
>> tools/lsfacts.c | 25 ++
>> tools/meson.build | 1 +
>
> [PATCH i-g-t v11 3/3] tools/lsfacts: Add facts listing tool
>
> and it all with cover letter.
No. Why would I do that? Both the change to runner/executor.c and
the new tool are very small, not justifying a separate patch. They
also match well in functionality also not justifying a different
patch. What gain are you trying to achieve here? I really like a
single patch in this case.
>
>> 8 files changed, 855 insertions(+)
>> create mode 100644 lib/igt_facts.c
>> create mode 100644 lib/igt_facts.h
>> create mode 100644 lib/tests/igt_facts.c
>> create mode 100644 tools/lsfacts.c
>>
>> diff --git a/lib/igt_facts.c b/lib/igt_facts.c
>> new file mode 100644
>> index 000000000..4749d3417
>> --- /dev/null
>> +++ b/lib/igt_facts.c
>> @@ -0,0 +1,755 @@
>> +// SPDX-License-Identifier: MIT
>> +// Copyright © 2024 Intel Corporation
>
> Copyright is usally formatted in C-comment style /* comment */
> to allow extending it (or adding Author: fields, etc) See other
> C files for example tests/device_reset.c
> So here:
>
> /*
> * Copyright © 2024 Intel Corporation
> */
>
> Do the same in other new files.
No sorry, this is because of you asked me to please kernel's
checkpatch.pl. I am not sending another revision for this nit
that changes the comment style. Have you seen how the comment
style varies in our code base, including recent commits? Why
being so picky with my patch?
That being said, if you want to implement these changes yourself
please go for it.
Thank you,
Peter
>
> Regards,
> Kamil
>
>> +
>> +#include <ctype.h>
>> +#include <libudev.h>
>> +#include <stdio.h>
>> +#include <sys/time.h>
>> +#include <time.h>
>> +
>> +#include "igt_core.h"
>> +#include "igt_device_scan.h"
>> +#include "igt_facts.h"
>> +#include "igt_kmod.h"
>> +#include "igt_list.h"
>> +#include "igt_taints.h"
>> +
>> +static struct igt_list_head igt_facts_list_drm_card_head;
>> +static struct igt_list_head igt_facts_list_kmod_head;
>> +static struct igt_list_head igt_facts_list_ktaint_head;
>> +static struct igt_list_head igt_facts_list_pci_gpu_head;
>> +
>> +
>> +/**
>> + * igt_facts_lists_init:
>> + *
>> + * Initialize igt_facts linked lists.
>> + *
>> + * Returns: void
>> + */
>> +void igt_facts_lists_init(void)
>> +{
>> + IGT_INIT_LIST_HEAD(&igt_facts_list_drm_card_head);
>> + IGT_INIT_LIST_HEAD(&igt_facts_list_kmod_head);
>> + IGT_INIT_LIST_HEAD(&igt_facts_list_ktaint_head);
>> + IGT_INIT_LIST_HEAD(&igt_facts_list_pci_gpu_head);
>> +}
>> +
>> +
>> +/**
>> + * igt_facts_log:
>> + * @last_test: name of the test that triggered the fact
>> + * @name: name of the fact
>> + * @new_value: new value of the fact
>> + * @old_value: old value of the fact
>> + *
>> + * Reports fact changes:
>> + * - new fact: if old_value is NULL and new_value is not NULL
>> + * - deleted fact: if new_value is NULL and old_value is not NULL
>> + * - changed fact: if new_value is different from old_value
>> + *
>> + * Returns: void
>> + */
>> +static void igt_facts_log(const char *last_test, const char *name,
>> + const char *new_value, const char *old_value)
>> +{
>
> ...cut...
>
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking
2024-12-06 13:16 ` Peter Senna Tschudin
@ 2024-12-06 16:46 ` Kamil Konieczny
0 siblings, 0 replies; 13+ messages in thread
From: Kamil Konieczny @ 2024-12-06 16:46 UTC (permalink / raw)
To: Peter Senna Tschudin
Cc: igt-dev@lists.freedesktop.org, Ryszard Knop, Janusz Krzysztofik,
Zbigniew Kempczyński, Lucas De Marchi, luciano.coelho,
nirmoy.das, stuart.summers, himal.prasad.ghimiray,
dominik.karol.piatkowski, katarzyna.piecielska, Petri Latvala,
Juha-Pekka Heikkila, Juha-Pekka Heikkila, Swati Sharma,
Jani Saarinen, Jani Nikula, Rob Clark, Rob Clark, Helen Koike,
Melissa Wen, Maíra Canal
Hi Peter,
On 2024-12-06 at 14:16:27 +0100, Peter Senna Tschudin wrote:
> Hi Kamil,
>
> Thank you for your comments. I appreciate the opportunity to clarify and address
> your concerns.
>
> On 06.12.2024 12:42, Kamil Konieczny wrote:
> > Hi Peter,
> > On 2024-12-05 at 05:54:26 +0100, Peter Senna Tschudin wrote:
> >
> > overall looks good, I have few nits, see below.
>
> Thank you! Our CI tested it thoroughly, it creates value, and it works well.
> It is time to merge it.
>
> >
> > As for subject, it usally informs what changed, so "igt-runner fact checking"
> > is a little misleading. Please see my ask about that below.
>
> Please explain what is misleading about the subject. It looks good to me.
>
Subject suggest that this is a change only for runner, while it
also adds a tool lsfacts.
> >
> >> When using igt-runner, collect facts before each test and after the
s/igt-runner/igt_runner/
> >> last test, and report when facts change. The facts are:
Add newline here.
> >> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20)
> >> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1
> >> - Kernel taints: kernel.is_tainted.taint_warn: true
> >> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true
imho here it should be general description what it will print, like:
- GPUs on PCI bus: 'hardware PCI bus' 'GPU name'
- Associations between PCI GPU and DRM card: 'PCI bus': 'card number'
- Kernel taints: 'true' or 'false'
- GPU kernel modules loaded: 'driver name'
and then you could provide an example.
> >>
> >> This change imposes little execution overhead and adds just a few
> >
> > imho 'little overhead' is beause facts are only four but I agree
> > we could change implementation when its number grows.
>
> Little overhead is because of how I implemented the code. The execution
> is efficient and produces as little text output as possible.
>
> >
> >> lines of logging. The facts will be printed on normal igt-runner
> >> output. Here is a real example from our CI showing
> >> hotreplug-lateclose changing the DRM card number and tainting the
> >> kernel on the abort path:
> >>
> >> [245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose)
> >> [245.383596] Starting subtest: hotreplug-lateclose
> >> [249.843361] Aborting: Lockdep not active
> >> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1
> >> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true
> >> [249.859075] Closing watchdogs
Where is a note about adding new tool? Patch should describe
all changes. I would prefer to see such note at beginning of
description. Btw splitting this into few functional patches
will gain that naturally.
> >>
> >> CC: Ryszard Knop <ryszard.knop@intel.com>
> >> CC: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> >> CC: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> >> CC: Lucas De Marchi <lucas.demarchi@intel.com>
> >> CC: luciano.coelho@intel.com
> >> CC: nirmoy.das@intel.com
> >> CC: stuart.summers@intel.com
> >> CC: himal.prasad.ghimiray@intel.com
> >> CC: dominik.karol.piatkowski@intel.com
> >> CC: katarzyna.piecielska@intel.com
> >
> > +cc Petri + kms devs
> > Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> > Cc: Juha-Pekka Heikkila <juha-pekka.heikkila@intel.com>
> > Cc: Swati Sharma <swati2.sharma@intel.com>
> > Cc: Jani Saarinen <jani.saarinen@intel.com>
> > Cc: Jani Nikula <jani.nikula@linux.intel.com>
> >
> > +other gpu devs
> > Cc: Rob Clark <robdclark@gmail.com>
> > Cc: Rob Clark <robdclark@chromium.org>
> > Cc: Helen Koike <helen.koike@collabora.com>
> >
> > +drm ci dev
> > Cc: Melissa Wen <mwen@igalia.com>
> > Cc: Maíra Canal <mcanal@igalia.com>
>
> That is a lot of people to CC, but ok.
>
Well I do not know if it will help other igt_runner users,
so it is better to ask.
> >
> >> Signed-off-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
> >> ---
> >> Re-sending as the mailing list server rejected the message
> >> yesterday due to a probably full disk...
> >>
> >> v10:
> >> - fix memory leaks from asprintf (Thank you Dominik Karol!)
> >> - fix comments for consistency (Thank you Dominik Karol!)
> >>
> >> v9:
> >> - do not report new hardware when loading/unloading kmod changes
> >> the string of the GPU name. I accidentally reintroduced this
> >> issue when refactoring to use linked lists.
> >> - add tools/lsfacts: 9 lines of code that print either the facts
> >> or that no facts were found.
> >> - fix code comments describing functions
> >> - fix white space issues
> >>
> >> v8:
> >> - fix white space issues
> >>
> >> v7:
> >> - refactor to use linked lists provided by igt_lists
> >> - Added function arguments to code comments
> >> - updated commit message
> >>
> >> v6:
> >> - sort includes in igt_facts.c alphabetically
> >> - add facts for kernel taints using igt_kernel_tainted() and
> >> igt_explain_taints()
> >>
> >> v5:
> >> - fix the broken patch format from v4
> >>
> >> v4:
> >> - fix a bug on delete_fact()
> >> - drop glib and calls to g_ functions
> >> - change commit message to indicate that report only on fact changes
> >> - use consistent format for reporting changes
> >> - fix SPDX header format
> >>
> >> v3:
> >> - refreshed commit message
> >> - changed format SPDX string
> >> - removed license text
> >> - replace last_test assignment when null by two ternary operators
> >> - added function descriptions following example found elsewhere in
> >> the code
> >> - added igt_assert to catch failures to realloc()
> >>
> >> v2:
> >> - add lib/tests/igt_facts.c for basic unit testing
> >> - bugfix: do not report a new gpu when the driver changes the gpu name
> >> - bugfix: do not report the pci_id twice on the gpu name
> >>
> >> lib/igt_facts.c | 755 ++++++++++++++++++++++++++++++++++++++++++
> >> lib/igt_facts.h | 47 +++
> >> lib/meson.build | 1 +
> >> lib/tests/igt_facts.c | 15 +
> >> lib/tests/meson.build | 1 +
> >
> > Please split your patchset into three, add above files to
> > first patch:
> >
> > [PATCH i-g-t v11 1/3] lib/igt_facts: add fact checking lib
> >
> >> runner/executor.c | 10 +
> >
> > [PATCH i-g-t v11 2/3] runner/executor: Check facts at tests run
> >
> >> tools/lsfacts.c | 25 ++
> >> tools/meson.build | 1 +
> >
> > [PATCH i-g-t v11 3/3] tools/lsfacts: Add facts listing tool
> >
> > and it all with cover letter.
>
> No. Why would I do that? Both the change to runner/executor.c and
> the new tool are very small, not justifying a separate patch. They
> also match well in functionality also not justifying a different
> patch. What gain are you trying to achieve here? I really like a
> single patch in this case.
>
You could split it in other ways, first adding lib and tool lsfacts,
then small patch change in runner/executor.c
It is also about a risk, change in igt_runner is a main risk and in
case we need a revert, I would prefer to revert small patch, one
from 2 or 3 rather than one big that added all. Additional gain is
more clear git log history.
> >
> >> 8 files changed, 855 insertions(+)
> >> create mode 100644 lib/igt_facts.c
> >> create mode 100644 lib/igt_facts.h
> >> create mode 100644 lib/tests/igt_facts.c
> >> create mode 100644 tools/lsfacts.c
> >>
> >> diff --git a/lib/igt_facts.c b/lib/igt_facts.c
> >> new file mode 100644
> >> index 000000000..4749d3417
> >> --- /dev/null
> >> +++ b/lib/igt_facts.c
> >> @@ -0,0 +1,755 @@
> >> +// SPDX-License-Identifier: MIT
> >> +// Copyright © 2024 Intel Corporation
> >
> > Copyright is usally formatted in C-comment style /* comment */
> > to allow extending it (or adding Author: fields, etc) See other
> > C files for example tests/device_reset.c
> > So here:
> >
> > /*
> > * Copyright © 2024 Intel Corporation
> > */
> >
> > Do the same in other new files.
>
> No sorry, this is because of you asked me to please kernel's
> checkpatch.pl. I am not sending another revision for this nit
> that changes the comment style. Have you seen how the comment
> style varies in our code base, including recent commits? Why
> being so picky with my patch?
>
I try to look into new patches but I cannot take care of every one.
checkpatch.pl checks SPDX line, while 'Copyright' is in separate
comment. Check for yourself:
grep Copyright lib/*c
grep Copyright lib/*h
None of it has '//' as a starting comment.
And I checked my replay to your v1, I asked about dropping text
_after_ Copyright, I did not wrote any about using '//' for it.
> That being said, if you want to implement these changes yourself
> please go for it.
Well, no. I just skimmed over your changes and see there is
list of gpu drivers there:
+const char *igt_fact_kmod_list[] = {
+ "amdgpu",
+ "i915",
+ "nouveau",
+ "radeon",
+ "xe",
+ "\0"
+};
It should be in sync with lib/drmtest.c
} modules[] = {
{ DRIVER_AMDGPU, "amdgpu" },
{ DRIVER_INTEL, "i915", modprobe_i915 },
{ DRIVER_MSM, "msm" },
{ DRIVER_PANFROST, "panfrost" },
{ DRIVER_V3D, "v3d" },
{ DRIVER_VC4, "vc4" },
{ DRIVER_VGEM, "vgem" },
{ DRIVER_VMWGFX, "vmwgfx" },
{ DRIVER_XE, "xe" },
{}
or at least add a comment why it is not.
Regards,
Kamil
>
> Thank you,
>
> Peter
>
> >
> > Regards,
> > Kamil
> >
> >> +
> >> +#include <ctype.h>
> >> +#include <libudev.h>
> >> +#include <stdio.h>
> >> +#include <sys/time.h>
> >> +#include <time.h>
> >> +
> >> +#include "igt_core.h"
> >> +#include "igt_device_scan.h"
> >> +#include "igt_facts.h"
> >> +#include "igt_kmod.h"
> >> +#include "igt_list.h"
> >> +#include "igt_taints.h"
> >> +
> >> +static struct igt_list_head igt_facts_list_drm_card_head;
> >> +static struct igt_list_head igt_facts_list_kmod_head;
> >> +static struct igt_list_head igt_facts_list_ktaint_head;
> >> +static struct igt_list_head igt_facts_list_pci_gpu_head;
> >> +
> >> +
> >> +/**
> >> + * igt_facts_lists_init:
> >> + *
> >> + * Initialize igt_facts linked lists.
> >> + *
> >> + * Returns: void
> >> + */
> >> +void igt_facts_lists_init(void)
> >> +{
> >> + IGT_INIT_LIST_HEAD(&igt_facts_list_drm_card_head);
> >> + IGT_INIT_LIST_HEAD(&igt_facts_list_kmod_head);
> >> + IGT_INIT_LIST_HEAD(&igt_facts_list_ktaint_head);
> >> + IGT_INIT_LIST_HEAD(&igt_facts_list_pci_gpu_head);
> >> +}
> >> +
> >> +
> >> +/**
> >> + * igt_facts_log:
> >> + * @last_test: name of the test that triggered the fact
> >> + * @name: name of the fact
> >> + * @new_value: new value of the fact
> >> + * @old_value: old value of the fact
> >> + *
> >> + * Reports fact changes:
> >> + * - new fact: if old_value is NULL and new_value is not NULL
> >> + * - deleted fact: if new_value is NULL and old_value is not NULL
> >> + * - changed fact: if new_value is different from old_value
> >> + *
> >> + * Returns: void
> >> + */
> >> +static void igt_facts_log(const char *last_test, const char *name,
> >> + const char *new_value, const char *old_value)
> >> +{
> >
> > ...cut...
> >
> >> --
> >> 2.34.1
> >>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking
2024-12-06 5:45 ` Peter Senna Tschudin
@ 2024-12-09 9:17 ` Janusz Krzysztofik
2024-12-09 11:06 ` Peter Senna Tschudin
0 siblings, 1 reply; 13+ messages in thread
From: Janusz Krzysztofik @ 2024-12-09 9:17 UTC (permalink / raw)
To: igt-dev@lists.freedesktop.org, Peter Senna Tschudin
Cc: Ryszard Knop, Zbigniew Kempczyński, Lucas De Marchi,
luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray,
dominik.karol.piatkowski, katarzyna.piecielska,
Janusz Krzysztofik
On Friday, 6 December 2024 06:45:31 CET Peter Senna Tschudin wrote:
> Hi Janusz,
>
> Thank you for your detailed comments. I appreciate the opportunity
> to clarify and address your concerns.
>
> On 05.12.2024 15:05, Janusz Krzysztofik wrote:
> > Hi Peter,
> >
> > On Wednesday, 4 December 2024 19:44:53 CET Peter Senna Tschudin wrote:
> >> When using igt-runner, collect facts before each test and after the
> >> last test, and report when facts change. The facts are:
> >> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20)
> >> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1
> >> - Kernel taints: kernel.is_tainted.taint_warn: true
> >> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true
> >>
> >> This change imposes little execution overhead and adds just a few
> >> lines of logging. The facts will be printed on normal igt-runner
> >> output. Here is a real example from our CI shwoing
> >> hotreplug-lateclose changing the DRM card number
> >
> > Since you give that as an example of how helpful your facts can be, and follow
> > that with a kernel taint example, that may indicate you think, and users of
> > your facts may then be mislead having that read, that the taint was related to
> > the change of card number, while both had nothing to do with each other.
>
> Let’s take a step back to define the purpose and scope of igt-facts:
> - Definition of a fact from the dictionary: A fact is an objectively verifiable
> piece of information.
> - Purpose of igt-facts: Track which tests cause changes to the facts.
>
> The operation is straightforward: facts are collected before and after each test,
> and any differences are logged. Here’s an example showing a fact change and a new
> fact after running hotreplug-lateclose:
>
> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1
> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true
>
> This output highlights the facts without implying causation between them. The
> tool(and my commit message) neither explains relationships between facts nor
> misleads users into assuming causation.
For me your commit message does.
Can you please provide a full list of "facts" your code is supposed to handle?
Can you please explain why you selected just those "facts", not others?
Thanks,
Janusz
>
> >
> > Please add something like ', which is expected,' to your description. Changed
> > card number is expected, and that's nothing wrong with that. The old, still
> > open instance of the driver still exists. It is expected to be already
> > decoupled from hardware, but it still occupies its minor device number. Then,
> > new instance of the driver, attached to the same hardware, gets first free
> > minor number. Refresh of IGT device filter before health check can perfectly
> > handle that case.
>
> The igt-facts tool reports changes without qualifying them as expected or
> unexpected. For example, the change in the DRM card number is logged as a fact,
> irrespective of its expected nature.
>
> >
> >> and tainting the
> >> kernel on the abort path:
> >
> > No, hotreplug-lateclose doesn't taint the kernel. That's wrong conclusion from
> > what was reported, or wrong wording at least. Kernel taint (or lockdep not
> > active) must have been a result of driver issues (or maybe well known
> > limitations) exposed by the test. Then igt_runner took the abort path since
> > it detected those unhealthy conditions. Again, users of your facts may be
> > mislead if your messages can really suggest what you tell us about what you
> > think has happened.
>
> The kernel taint reported is a fact observed after the test. While its root cause
> lies within the driver or other system components, this is outside the scope of
> igt-facts. The report simply reflects what changed during the test.
>
> To summarize:
> - igt-facts is a factual reporting tool. It does not establish causation or
> interpret changes.
> - Both the DRM card number change and kernel taint were factual observations post
> hotreplug-lateclose.
>
> I hope this clarifies the intent and operation of igt-facts. Please let me know if
> further discussion is needed.
>
> Thank you,
>
> Peter
> >
> > The whole idea of testing i915 resistance to hot unplug scenarios came from
> > perceived cases of VFs potentially disappearing from VMs. Some significant
> > effort was put on that feature of the driver, but since the 'hot' test
> > variants were never unblocked in CI, things tended to get worse and worse over
> > time while new driver features that didn't care for hot unplug capability were
> > added.
> >
> > Thanks,
> > Janusz
> >
> >>
> >> [245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose)
> >> [245.383596] Starting subtest: hotreplug-lateclose
> >> [249.843361] Aborting: Lockdep not active
> >> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1
> >> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true
> >> [249.859075] Closing watchdogs
> >>
> >> CC: Ryszard Knop <ryszard.knop@intel.com>
> >> CC: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> >> CC: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
> >> CC: Lucas De Marchi <lucas.demarchi@intel.com>
> >> CC: luciano.coelho@intel.com
> >> CC: nirmoy.das@intel.com
> >> CC: stuart.summers@intel.com
> >> CC: himal.prasad.ghimiray@intel.com
> >> CC: dominik.karol.piatkowski@intel.com
> >> CC: katarzyna.piecielska@intel.com
> >> Signed-off-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
> >> ---
> >> v10:
> >> - fix memory leaks from asprintf (Thank you Dominik Karol!)
> >> - fix comments for consistency (Thank you Dominik Karol!)
> >>
> >> v9:
> >> - do not report new hardware when loading/unloading kmod changes
> >> the string of the GPU name. I accidentally reintroduced this
> >> issue when refactoring to use linked lists.
> >> - add tools/lsfacts: 9 lines of code that print either the facts
> >> or that no facts were found.
> >> - fix code comments describing functions
> >> - fix white space issues
> >>
> >> v8:
> >> - fix white space issues
> >>
> >> v7:
> >> - refactor to use linked lists provided by igt_lists
> >> - Added function arguments to code comments
> >> - updated commit message
> >>
> >> v6:
> >> - sort includes in igt_facts.c alphabetically
> >> - add facts for kernel taints using igt_kernel_tainted() and
> >> igt_explain_taints()
> >>
> >> v5:
> >> - fix the broken patch format from v4
> >>
> >> v4:
> >> - fix a bug on delete_fact()
> >> - drop glib and calls to g_ functions
> >> - change commit message to indicate that report only on fact changes
> >> - use consistent format for reporting changes
> >> - fix SPDX header format
> >>
> >> v3:
> >> - refreshed commit message
> >> - changed format SPDX string
> >> - removed license text
> >> - replace last_test assignment when null by two ternary operators
> >> - added function descriptions following example found elsewhere in
> >> the code
> >> - added igt_assert to catch failures to realloc()
> >>
> >> v2:
> >> - add lib/tests/igt_facts.c for basic unit testing
> >> - bugfix: do not report a new gpu when the driver changes the gpu name
> >> - bugfix: do not report the pci_id twice on the gpu name
> >>
> >> lib/igt_facts.c | 755 ++++++++++++++++++++++++++++++++++++++++++
> >> lib/igt_facts.h | 47 +++
> >> lib/meson.build | 1 +
> >> lib/tests/igt_facts.c | 15 +
> >> lib/tests/meson.build | 1 +
> >> runner/executor.c | 10 +
> >> tools/lsfacts.c | 25 ++
> >> tools/meson.build | 1 +
> >> 8 files changed, 855 insertions(+)
> >> create mode 100644 lib/igt_facts.c
> >> create mode 100644 lib/igt_facts.h
> >> create mode 100644 lib/tests/igt_facts.c
> >> create mode 100644 tools/lsfacts.c
> >>
> >> diff --git a/lib/igt_facts.c b/lib/igt_facts.c
> >> new file mode 100644
> >> index 000000000..4749d3417
> >> --- /dev/null
> >> +++ b/lib/igt_facts.c
> >> @@ -0,0 +1,755 @@
> >> +// SPDX-License-Identifier: MIT
> >> +// Copyright © 2024 Intel Corporation
> >> +
> >> +#include <ctype.h>
> >> +#include <libudev.h>
> >> +#include <stdio.h>
> >> +#include <sys/time.h>
> >> +#include <time.h>
> >> +
> >> +#include "igt_core.h"
> >> +#include "igt_device_scan.h"
> >> +#include "igt_facts.h"
> >> +#include "igt_kmod.h"
> >> +#include "igt_list.h"
> >> +#include "igt_taints.h"
> >> +
> >> +static struct igt_list_head igt_facts_list_drm_card_head;
> >> +static struct igt_list_head igt_facts_list_kmod_head;
> >> +static struct igt_list_head igt_facts_list_ktaint_head;
> >> +static struct igt_list_head igt_facts_list_pci_gpu_head;
> >> +
> >> +
> >> +/**
> >> + * igt_facts_lists_init:
> >> + *
> >> + * Initialize igt_facts linked lists.
> >> + *
> >> + * Returns: void
> >> + */
> >> +void igt_facts_lists_init(void)
> >> +{
> >> + IGT_INIT_LIST_HEAD(&igt_facts_list_drm_card_head);
> >> + IGT_INIT_LIST_HEAD(&igt_facts_list_kmod_head);
> >> + IGT_INIT_LIST_HEAD(&igt_facts_list_ktaint_head);
> >> + IGT_INIT_LIST_HEAD(&igt_facts_list_pci_gpu_head);
> >> +}
> >> +
> >> +
> >> +/**
> >> + * igt_facts_log:
> >> + * @last_test: name of the test that triggered the fact
> >> + * @name: name of the fact
> >> + * @new_value: new value of the fact
> >> + * @old_value: old value of the fact
> >> + *
> >> + * Reports fact changes:
> >> + * - new fact: if old_value is NULL and new_value is not NULL
> >> + * - deleted fact: if new_value is NULL and old_value is not NULL
> >> + * - changed fact: if new_value is different from old_value
> >> + *
> >> + * Returns: void
> >> + */
> >> +static void igt_facts_log(const char *last_test, const char *name,
> >> + const char *new_value, const char *old_value)
> >> +{
> >> + struct timespec uptime_ts;
> >> + char *uptime = NULL;
> >> + const char *before_tests = "before any test";
> >> +
> >> + if (old_value == NULL && new_value == NULL)
> >> + return;
> >> +
> >> + if (clock_gettime(CLOCK_BOOTTIME, &uptime_ts) != 0)
> >> + return;
> >> +
> >> + asprintf(&uptime,
> >> + "%ld.%06ld",
> >> + uptime_ts.tv_sec,
> >> + uptime_ts.tv_nsec / 1000);
> >> +
> >> + /* New fact */
> >> + if (old_value == NULL && new_value != NULL) {
> >> + igt_info("[%s] [FACT %s] new: %s: %s\n",
> >> + uptime,
> >> + last_test ? last_test : before_tests,
> >> + name,
> >> + new_value);
> >> + goto out;
> >> + }
> >> +
> >> + /* Update fact */
> >> + if (old_value != NULL && new_value != NULL) {
> >> + igt_info("[%s] [FACT %s] changed: %s: %s -> %s\n",
> >> + uptime,
> >> + last_test ? last_test : before_tests,
> >> + name,
> >> + old_value,
> >> + new_value);
> >> + goto out;
> >> + }
> >> +
> >> + /* Deleted fact */
> >> + if (old_value != NULL && new_value == NULL) {
> >> + igt_info("[%s] [FACT %s] deleted: %s: %s\n",
> >> + uptime,
> >> + last_test ? last_test : before_tests,
> >> + name,
> >> + old_value);
> >> + goto out;
> >> + }
> >> +
> >> +out:
> >> + free(uptime);
> >> +}
> >> +
> >> +/**
> >> + * igt_facts_list_get:
> >> + * @name: name of the fact to be added
> >> + * @head: head of the list
> >> + *
> >> + * Get a fact from the list.
> >> + *
> >> + * Returns: pointer to the fact if found, NULL otherwise
> >> + *
> >> + */
> >> +static igt_fact *igt_facts_list_get(const char *name,
> >> + struct igt_list_head *head)
> >> +{
> >> + igt_fact *fact = NULL;
> >> +
> >> + if (igt_list_empty(head))
> >> + return NULL;
> >> +
> >> + igt_list_for_each_entry(fact, head, link) {
> >> + if (strcmp(fact->name, name) == 0)
> >> + return fact;
> >> + }
> >> + return NULL;
> >> +}
> >> +
> >> +/**
> >> + * igt_facts_list_del:
> >> + * @name: name of the fact to be added
> >> + * @head: head of the list
> >> + * @last_test: name of the last test
> >> + * @log: bool indicating if the delete operation should be logged
> >> + *
> >> + * Delete a fact from the list.
> >> + *
> >> + * Returns: bool indicating if fact was deleted from the list
> >> + *
> >> + */
> >> +static bool igt_facts_list_del(const char *name,
> >> + struct igt_list_head *head,
> >> + const char *last_test,
> >> + bool log)
> >> +{
> >> + igt_fact *fact = NULL;
> >> +
> >> + if (igt_list_empty(head))
> >> + return false;
> >> +
> >> + igt_list_for_each_entry(fact, head, link) {
> >> + if (strcmp(fact->name, name) == 0) {
> >> + if (log)
> >> + igt_facts_log(last_test, fact->name,
> >> + NULL, fact->value);
> >> +
> >> + igt_list_del(&fact->link);
> >> + free(fact->name);
> >> + free(fact->value);
> >> + free(fact->last_test);
> >> + free(fact);
> >> + return true;
> >> + }
> >> + }
> >> + return false;
> >> +}
> >> +
> >> +/**
> >> + * igt_facts_list_add:
> >> + * @name: name of the fact to be added
> >> + * @value: value of the fact to be added
> >> + * @last_test: name of the last test
> >> + * @head: head of the list
> >> + *
> >> + * Returns: bool indicating if fact was added to the list
> >> + *
> >> + */
> >> +static bool igt_facts_list_add(const char *name,
> >> + const char *value,
> >> + const char *last_test,
> >> + struct igt_list_head *head)
> >> +{
> >> + igt_fact *new_fact = NULL, *old_fact = NULL;
> >> + bool logged = false;
> >> +
> >> + if (name == NULL || value == NULL)
> >> + return false;
> >> +
> >> + old_fact = igt_facts_list_get(name, head);
> >> + if (old_fact) {
> >> + if (strcmp(old_fact->value, value) == 0) {
> >> + old_fact->present = true;
> >> + return false;
> >> + }
> >> + igt_facts_log(last_test, name, value, old_fact->value);
> >> + logged = true;
> >> + igt_facts_list_del(name, head, last_test, false);
> >> + }
> >> +
> >> + new_fact = malloc(sizeof(igt_fact));
> >> + if (new_fact == NULL)
> >> + return false;
> >> +
> >> + new_fact->name = strdup(name);
> >> + new_fact->value = strdup(value);
> >> + new_fact->last_test = last_test ? strdup(last_test) : NULL;
> >> + new_fact->present = true;
> >> +
> >> + if (!logged)
> >> + igt_facts_log(last_test, name, value, NULL);
> >> +
> >> + igt_list_add(&new_fact->link, head);
> >> +
> >> + return true;
> >> +}
> >> +
> >> +/**
> >> + * igt_facts_list_mark:
> >> + * @head: head of the list
> >> + *
> >> + * Mark all facts in the list as not present. Opted for the mark and sweep
> >> + * design pattern due to its simplicity and efficiency.
> >> + *
> >> + * Returns: void
> >> + */
> >> +static void igt_facts_list_mark(struct igt_list_head *head)
> >> +{
> >> + igt_fact *fact = NULL;
> >> +
> >> + if (igt_list_empty(head))
> >> + return;
> >> +
> >> + igt_list_for_each_entry(fact, head, link)
> >> + fact->present = false;
> >> +}
> >> +
> >> +/**
> >> + * igt_facts_list_sweep:
> >> + * @head: head of the list
> >> + * @last_test: name of the last test
> >> + *
> >> + * Sweep the list and delete all facts that are not present. Opted for the mark
> >> + * and sweep design pattern due to its simplicity and efficiency.
> >> + *
> >> + * Returns: void
> >> + */
> >> +static void igt_facts_list_sweep(struct igt_list_head *head,
> >> + const char *last_test)
> >> +{
> >> + igt_fact *fact = NULL, *tmp = NULL;
> >> +
> >> + if (igt_list_empty(head))
> >> + return;
> >> +
> >> + igt_list_for_each_entry_safe(fact, tmp, head, link)
> >> + if (!fact->present)
> >> + igt_facts_list_del(fact->name, head, last_test, true);
> >> +}
> >> +
> >> +/**
> >> + * igt_facts_list_mark_and_sweep:
> >> + * @head: head of the list
> >> + *
> >> + * Clean up the list using mark and sweep. Opted for the mark and sweep
> >> + * design pattern due to its simplicity and efficiency.
> >> + *
> >> + * Returns: void
> >> + */
> >> +static void igt_facts_list_mark_and_sweep(struct igt_list_head *head)
> >> +{
> >> + igt_facts_list_mark(head);
> >> + igt_facts_list_sweep(head, NULL);
> >> +}
> >> +
> >> +/**
> >> + * igt_facts_are_all_lists_empty:
> >> + *
> >> + * Returns true if all lists are empty. Used by the tool lsfacts.
> >> + *
> >> + * Returns: bool
> >> + */
> >> +bool igt_facts_are_all_lists_empty(void)
> >> +{
> >> + return igt_list_empty(&igt_facts_list_drm_card_head) &&
> >> + igt_list_empty(&igt_facts_list_kmod_head) &&
> >> + igt_list_empty(&igt_facts_list_ktaint_head) &&
> >> + igt_list_empty(&igt_facts_list_pci_gpu_head);
> >> +}
> >> +
> >> +/**
> >> + * igt_facts_scan_pci_gpus:
> >> + * @last_test: name of the last test
> >> + *
> >> + * This function scans the pci bus for gpus using udev. It uses
> >> + * igt_facts_list_mark(), igt_facts_list_add() and igt_facts_list_sweep() to
> >> + * update igt_facts_list_pci_gpu_head.
> >> + *
> >> + * Returns: void
> >> + */
> >> +static void igt_facts_scan_pci_gpus(const char *last_test)
> >> +{
> >> + static struct igt_list_head *head = &igt_facts_list_pci_gpu_head;
> >> + struct udev *udev = NULL;
> >> + struct udev_enumerate *enumerate = NULL;
> >> + struct udev_list_entry *devices, *dev_list_entry;
> >> + struct igt_device_card card;
> >> + char pcistr[10];
> >> + int ret;
> >> + char *factname = NULL;
> >> + char *factvalue = NULL;
> >> +
> >> + udev = udev_new();
> >> + if (!udev) {
> >> + igt_warn("Failed to create udev context\n");
> >> + return;
> >> + }
> >> +
> >> + enumerate = udev_enumerate_new(udev);
> >> + if (!enumerate) {
> >> + igt_warn("Failed to create udev enumerate\n");
> >> + udev_unref(udev);
> >> + return;
> >> + }
> >> +
> >> + ret = udev_enumerate_add_match_subsystem(enumerate, "pci");
> >> + if (ret < 0)
> >> + goto out;
> >> +
> >> + ret = udev_enumerate_add_match_property(enumerate,
> >> + "PCI_CLASS",
> >> + "30000");
> >> + if (ret < 0)
> >> + goto out;
> >> +
> >> + ret = udev_enumerate_add_match_property(enumerate,
> >> + "PCI_CLASS",
> >> + "38000");
> >> + if (ret < 0)
> >> + goto out;
> >> +
> >> + ret = udev_enumerate_scan_devices(enumerate);
> >> + if (ret < 0)
> >> + goto out;
> >> +
> >> + devices = udev_enumerate_get_list_entry(enumerate);
> >> + if (!devices)
> >> + goto out;
> >> +
> >> + igt_facts_list_mark(head);
> >> +
> >> + udev_list_entry_foreach(dev_list_entry, devices) {
> >> + const char *path;
> >> + struct udev_device *udev_dev;
> >> + struct udev_list_entry *entry;
> >> + char *model = NULL;
> >> + char *codename = NULL;
> >> + igt_fact *old_fact = NULL;
> >> +
> >> + path = udev_list_entry_get_name(dev_list_entry);
> >> + udev_dev = udev_device_new_from_syspath(udev, path);
> >> + if (!udev_dev)
> >> + continue;
> >> +
> >> + /* Strip path to only the content after the last / */
> >> + path = strrchr(path, '/');
> >> + if (path)
> >> + path++;
> >> + else
> >> + path = "unknown";
> >> +
> >> + strcpy(card.pci_slot_name, "-");
> >> +
> >> + entry = udev_device_get_properties_list_entry(udev_dev);
> >> + while (entry) {
> >> + const char *name = udev_list_entry_get_name(entry);
> >> + const char *value = udev_list_entry_get_value(entry);
> >> +
> >> + entry = udev_list_entry_get_next(entry);
> >> + if (!strcmp(name, "ID_MODEL_FROM_DATABASE"))
> >> + model = strdup(value);
> >> + else if (!strcmp(name, "PCI_ID"))
> >> + igt_assert_eq(sscanf(value, "%hx:%hx",
> >> + &card.pci_vendor,
> >> + &card.pci_device), 2);
> >> + }
> >> + snprintf(pcistr, sizeof(pcistr), "%04x:%04x",
> >> + card.pci_vendor, card.pci_device);
> >> + codename = igt_device_get_pretty_name(&card, false);
> >> +
> >> + /* Set codename to null if it is the same string as pci_id */
> >> + if (codename && strcmp(pcistr, codename) == 0) {
> >> + free(codename);
> >> + codename = NULL;
> >> + }
> >> + asprintf(&factname, "%s.%s", pci_gpu_fact, path);
> >> + asprintf(&factvalue,
> >> + "%s %s %s",
> >> + pcistr,
> >> + codename ? codename : "",
> >> + model ? model : "");
> >> +
> >> + /**
> >> + * Loading and unloading the kmod may change the human
> >> + * readeable string in value. Do not change value if the
> >> + * pci id is the same.
> >> + */
> >> + old_fact = igt_facts_list_get(factname, head);
> >> + if (old_fact && strncmp(old_fact->value, factvalue, 9) == 0)
> >> + old_fact->present = true;
> >> + else
> >> + igt_facts_list_add(factname, factvalue, last_test, head);
> >> +
> >> + free(codename);
> >> + free(model);
> >> + free(factname);
> >> + free(factvalue);
> >> + udev_device_unref(udev_dev);
> >> + }
> >> +
> >> + igt_facts_list_sweep(head, last_test);
> >> +
> >> +out:
> >> + udev_enumerate_unref(enumerate);
> >> + udev_unref(udev);
> >> +}
> >> +
> >> +/**
> >> + * igt_facts_scan_pci_drm_cards:
> >> + * @last_test: name of the last test
> >> + *
> >> + * This function scans the pci bus for drm cards using udev. It uses the
> >> + * igt_facts_list_mark(), igt_facts_list_add() and igt_facts_list_sweep() to
> >> + * update igt_facts_list_drm_card_head.
> >> + *
> >> + * Returns: void
> >> + */
> >> +static void igt_facts_scan_pci_drm_cards(const char *last_test)
> >> +{
> >> + static struct igt_list_head *head = &igt_facts_list_drm_card_head;
> >> + struct udev *udev = NULL;
> >> + struct udev_enumerate *enumerate = NULL;
> >> + struct udev_list_entry *devices, *dev_list_entry;
> >> + int ret;
> >> + char *factname = NULL;
> >> + char *factvalue = NULL;
> >> +
> >> + udev = udev_new();
> >> + if (!udev)
> >> + return;
> >> +
> >> + enumerate = udev_enumerate_new(udev);
> >> + if (!enumerate) {
> >> + udev_unref(udev);
> >> + return;
> >> + }
> >> +
> >> + ret = udev_enumerate_add_match_subsystem(enumerate, "drm");
> >> + if (ret < 0)
> >> + goto out;
> >> +
> >> + ret = udev_enumerate_scan_devices(enumerate);
> >> + if (ret < 0)
> >> + goto out;
> >> +
> >> + devices = udev_enumerate_get_list_entry(enumerate);
> >> + if (!devices)
> >> + goto out;
> >> +
> >> + ret = udev_enumerate_add_match_subsystem(enumerate, "drm");
> >> + if (ret < 0)
> >> + goto out;
> >> +
> >> + ret = udev_enumerate_scan_devices(enumerate);
> >> + if (ret < 0)
> >> + goto out;
> >> +
> >> + devices = udev_enumerate_get_list_entry(enumerate);
> >> + if (!devices)
> >> + goto out;
> >> +
> >> + igt_facts_list_mark(head);
> >> +
> >> + udev_list_entry_foreach(dev_list_entry, devices) {
> >> + const char *path;
> >> + struct udev_device *drm_dev, *pci_dev;
> >> + const char *drm_name, *pci_addr;
> >> +
> >> + path = udev_list_entry_get_name(dev_list_entry);
> >> + drm_dev = udev_device_new_from_syspath(udev, path);
> >> + if (!drm_dev)
> >> + continue;
> >> +
> >> + drm_name = udev_device_get_sysname(drm_dev);
> >> + /* Filter the device by name. Want devices such as card0 and card1.
> >> + * If the device has '-' in the name, contine
> >> + */
> >> + if (strncmp(drm_name, "card", 4) != 0 ||
> >> + strchr(drm_name, '-') != NULL) {
> >> + udev_device_unref(drm_dev);
> >> + continue;
> >> + }
> >> +
> >> + /* Get the pci address of the gpu associated with the drm_dev*/
> >> + pci_dev = udev_device_get_parent_with_subsystem_devtype(drm_dev,
> >> + "pci",
> >> + NULL);
> >> + if (pci_dev) {
> >> + pci_addr = udev_device_get_sysattr_value(pci_dev,
> >> + "address");
> >> + if (!pci_addr)
> >> + pci_addr = udev_device_get_sysname(pci_dev);
> >> + } else {
> >> + /* Some GPUs are platform devices. Ignore them. */
> >> + pci_addr = NULL;
> >> + udev_device_unref(drm_dev);
> >> + continue;
> >> + }
> >> +
> >> + asprintf(&factname, "%s.%s", drm_card_fact, pci_addr);
> >> + asprintf(&factvalue, "%s", drm_name);
> >> +
> >> + igt_facts_list_add(factname, factvalue, last_test, head);
> >> +
> >> + free(factname);
> >> + free(factvalue);
> >> + udev_device_unref(drm_dev);
> >> + }
> >> +
> >> + igt_facts_list_sweep(head, last_test);
> >> +
> >> +out:
> >> + udev_enumerate_unref(enumerate);
> >> + udev_unref(udev);
> >> +}
> >> +
> >> +/**
> >> + * igt_facts_scan_kernel_taints:
> >> + * @last_test: name of the last test
> >> + *
> >> + * This function scans for kernel taints using igt_kernel_tainted() and
> >> + * igt_explain_taints(). It will cut off the explanation keeping only the
> >> + * taint name.
> >> + *
> >> + * Returns: void
> >> + */
> >> +static void igt_facts_scan_kernel_taints(const char *last_test)
> >> +{
> >> + static struct igt_list_head *head = &igt_facts_list_ktaint_head;
> >> + unsigned long taints = 0;
> >> + const char *reason = NULL;
> >> + char *taint_name = NULL;
> >> + char *fact_name = NULL;
> >> +
> >> + taints = igt_kernel_tainted(&taints);
> >> + /* For testing, set all bits to 1
> >> + * taints = 0xFFFFFFFF;
> >> + */
> >> +
> >> +
> >> + igt_facts_list_mark(head);
> >> +
> >> + while ((reason = igt_explain_taints(&taints)) != NULL) {
> >> + /* Cut at the ':' to get only the taint name */
> >> + taint_name = strtok(strdup(reason), ":");
> >> + if (!taint_name)
> >> + continue;
> >> +
> >> + /* Lowercase taint_name */
> >> + for (int i = 0; taint_name[i]; i++)
> >> + taint_name[i] = tolower(taint_name[i]);
> >> +
> >> + asprintf(&fact_name, "%s.%s", ktaint_fact, taint_name);
> >> + igt_facts_list_add(fact_name, "true", last_test, head);
> >> +
> >> + free(taint_name);
> >> + free(fact_name);
> >> + }
> >> +
> >> + igt_facts_list_sweep(head, last_test);
> >> +}
> >> +
> >> +
> >> +/**
> >> + * igt_facts_scan_kernel_loaded_kmods:
> >> + * @last_test: name of the last test
> >> + *
> >> + * This function scans for loaded kmods using igt_fact_kmod_list and
> >> + * igt_kmod_is_loaded().
> >> + *
> >> + * Returns: void
> >> + */
> >> +static void igt_facts_scan_kernel_loaded_kmods(const char *last_test)
> >> +{
> >> + static struct igt_list_head *head = &igt_facts_list_kmod_head;
> >> + char *name = NULL;
> >> +
> >> + igt_facts_list_mark(head);
> >> +
> >> + /* Iterate over igt_fact_kmod_list[] until the element contains "\0" */
> >> + for (int i = 0; strcmp(igt_fact_kmod_list[i], "\0") != 0; i++) {
> >> + asprintf(&name, "%s.%s", kmod_fact, igt_fact_kmod_list[i]);
> >> + if (igt_kmod_is_loaded(igt_fact_kmod_list[i]))
> >> + igt_facts_list_add(name, "true", last_test, head);
> >> +
> >> + free(name);
> >> + }
> >> +
> >> + igt_facts_list_sweep(head, last_test);
> >> +}
> >> +
> >> +/**
> >> + * igt_facts:
> >> + * @last_test: name of the last test
> >> + *
> >> + * Call this function where you want to gather and report facts.
> >> + *
> >> + * Returns: void
> >> + */
> >> +void igt_facts(const char *last_test)
> >> +{
> >> + igt_facts_scan_pci_gpus(last_test);
> >> + igt_facts_scan_pci_drm_cards(last_test);
> >> + igt_facts_scan_kernel_taints(last_test);
> >> + igt_facts_scan_kernel_loaded_kmods(last_test);
> >> +
> >> + fflush(stdout);
> >> + fflush(stderr);
> >> +}
> >> +
> >> +/*
> >> + * Testing
> >> + *
> >> + * Defined here to keep most of the functions static
> >> + *
> >> + */
> >> +
> >> +/**
> >> + * igt_facts_test_add_get:
> >> + * @head: head of the list
> >> + *
> >> + * Tests igt_facts_list_add and igt_facts_list_get.
> >> + *
> >> + * Returns: void
> >> + */
> >> +static void igt_facts_test_add_get(struct igt_list_head *head)
> >> +{
> >> + igt_fact *fact = NULL;
> >> + bool ret;
> >> + const char *name = "hardware.pci.gpu_at_addr.0000:00:02.0";
> >> + const char *value = "8086:64a0 Intel Lunarlake (Gen20)";
> >> + const char *last_test = NULL;
> >> +
> >> + ret = igt_facts_list_add(name, value, last_test, head);
> >> + igt_assert(ret == true);
> >> +
> >> + // Assert that there is one element in the linked list
> >> + igt_assert_eq(igt_list_length(head), 1);
> >> +
> >> + // Assert that the element in the linked list is the one we added
> >> + fact = igt_facts_list_get(name, head);
> >> + igt_assert(fact != NULL);
> >> + igt_assert_eq(strcmp(fact->name, name), 0);
> >> + igt_assert_eq(strcmp(fact->value, value), 0);
> >> + igt_assert(fact->present == true);
> >> + igt_assert(fact->last_test == NULL);
> >> +}
> >> +
> >> +/**
> >> + * igt_facts_test_mark_and_sweep:
> >> + * @head: head of the list
> >> + *
> >> + * - Add 3 elements to the list and mark them as not present.
> >> + * - Update two of the elements and mark them as present.
> >> + * - Sweep the list and assert that
> >> + * - Only the two updated elements are present
> >> + * - The third element was deleted
> >> + *
> >> + * Returns: void
> >> + */
> >> +static void igt_facts_test_mark_and_sweep(struct igt_list_head *head)
> >> +{
> >> + igt_fact *fact = NULL;
> >> + const char *name1 = "hardware.pci.gpu_at_addr.0000:00:02.0";
> >> + const char *value1 = "8086:64a0 Intel Lunarlake (Gen20)";
> >> + const char *name2 = "hardware.pci.gpu_at_addr.0000:00:03.0";
> >> + const char *value2 = "8086:64a1 Intel Lunarlake (Gen21)";
> >> + const char *name3 = "hardware.pci.gpu_at_addr.0000:00:04.0";
> >> + const char *value3 = "8086:64a2 Intel Lunarlake (Gen22)";
> >> +
> >> + igt_facts_list_add(name1, value1, NULL, head);
> >> + igt_facts_list_add(name2, value2, NULL, head);
> >> + igt_facts_list_add(name3, value3, NULL, head);
> >> +
> >> + igt_facts_list_mark(head);
> >> +
> >> + igt_facts_list_add(name1, value1, NULL, head);
> >> + igt_facts_list_add(name2, value2, NULL, head);
> >> +
> >> + igt_facts_list_sweep(head, NULL);
> >> +
> >> + // Assert that there are two elements in the linked list
> >> + igt_assert_eq(igt_list_length(head), 2);
> >> +
> >> + // Assert that the two updated elements are present
> >> + fact = igt_facts_list_get(name1, head);
> >> + igt_assert(fact != NULL);
> >> + igt_assert(fact->present == true);
> >> +
> >> + fact = igt_facts_list_get(name2, head);
> >> + igt_assert(fact != NULL);
> >> + igt_assert(fact->present == true);
> >> +
> >> + // Assert that the third element was deleted
> >> + fact = igt_facts_list_get(name3, head);
> >> + igt_assert(fact == NULL);
> >> +}
> >> +
> >> +/**
> >> + * igt_facts_test:
> >> + *
> >> + * Main function for testing the igt_facts module
> >> + *
> >> + * Returns: bool indicating if the tests passed
> >> + */
> >> +void igt_facts_test(void)
> >> +{
> >> + const char *last_test = "Unit Testing";
> >> +
> >> + igt_facts_lists_init();
> >> +
> >> + /* Assert that all lists are empty */
> >> + igt_assert(igt_list_empty(&igt_facts_list_kmod_head));
> >> + igt_assert(igt_list_empty(&igt_facts_list_ktaint_head));
> >> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head));
> >> + igt_assert(igt_list_empty(&igt_facts_list_drm_card_head));
> >> +
> >> + /* Assert that add and get work. Will add one element to the list */
> >> + igt_facts_test_add_get(&igt_facts_list_pci_gpu_head);
> >> +
> >> + /* Assert that igt_facts_list_mark_and_sweep() cleans up the list */
> >> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == false);
> >> + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head);
> >> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == true);
> >> +
> >> + /* Test the mark and sweep pattern used to delete elements
> >> + * from the list
> >> + */
> >> + igt_facts_test_mark_and_sweep(&igt_facts_list_pci_gpu_head);
> >> +
> >> + /* Clean up the list and call igt_facts(). This should not crash */
> >> + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head);
> >> + igt_facts(last_test);
> >> +}
> >> diff --git a/lib/igt_facts.h b/lib/igt_facts.h
> >> new file mode 100644
> >> index 000000000..e4adca3fb
> >> --- /dev/null
> >> +++ b/lib/igt_facts.h
> >> @@ -0,0 +1,47 @@
> >> +/* SPDX-License-Identifier: MIT
> >> + * Copyright © 2024 Intel Corporation
> >> + */
> >> +
> >> +#include <stdbool.h>
> >> +
> >> +#include "igt_list.h"
> >> +
> >> +
> >> +/* igt_fact:
> >> + * @name: name of the fact
> >> + * @value: value of the fact
> >> + * @last_test: name of the test that triggered the fact
> >> + * @present: bool indicating if fact is present. Used for deleting facts from
> >> + * the list.
> >> + * @link: link to the next fact
> >> + *
> >> + * A fact is a piece of information that can be used to determine the state of
> >> + * the system.
> >> + *
> >> + */
> >> +typedef struct {
> >> + char *name;
> >> + char *value;
> >> + char *last_test;
> >> + bool present; /* For mark and seep */
> >> + struct igt_list_head link;
> >> +} igt_fact;
> >> +
> >> +const char *igt_fact_kmod_list[] = {
> >> + "amdgpu",
> >> + "i915",
> >> + "nouveau",
> >> + "radeon",
> >> + "xe",
> >> + "\0"
> >> +};
> >> +
> >> +const char *kmod_fact = "kernel.kmod_is_loaded"; /* true or false */
> >> +const char *ktaint_fact = "kernel.is_tainted"; /* taint name: taint_warn */
> >> +const char *pci_gpu_fact = "hardware.pci.gpu_at_addr"; /* id vendor model */
> >> +const char *drm_card_fact = "hardware.pci.drm_card_at_addr"; /* cardX */
> >> +
> >> +void igt_facts_lists_init(void);
> >> +void igt_facts(const char *last_test);
> >> +bool igt_facts_are_all_lists_empty(void);
> >> +void igt_facts_test(void); /* For unit testing only */
> >> diff --git a/lib/meson.build b/lib/meson.build
> >> index c3556a921..c44ca2b5a 100644
> >> --- a/lib/meson.build
> >> +++ b/lib/meson.build
> >> @@ -18,6 +18,7 @@ lib_sources = [
> >> 'i915/i915_crc.c',
> >> 'igt_collection.c',
> >> 'igt_color_encoding.c',
> >> + 'igt_facts.c',
> >> 'igt_crc.c',
> >> 'igt_debugfs.c',
> >> 'igt_device.c',
> >> diff --git a/lib/tests/igt_facts.c b/lib/tests/igt_facts.c
> >> new file mode 100644
> >> index 000000000..7fa9d0f22
> >> --- /dev/null
> >> +++ b/lib/tests/igt_facts.c
> >> @@ -0,0 +1,15 @@
> >> +// SPDX-License-Identifier: MIT
> >> +// Copyright © 2024 Intel Corporation
> >> +
> >> +#include <stdbool.h>
> >> +
> >> +#include "igt_core.h"
> >> +#include "igt_facts.h"
> >> +
> >> +/* Tests are not defined here so we can keep most of the functions static */
> >> +
> >> +igt_simple_main
> >> +{
> >> + igt_info("Running igt_facts_test\n");
> >> + igt_facts_test();
> >> +}
> >> diff --git a/lib/tests/meson.build b/lib/tests/meson.build
> >> index df8092638..1ce19f63c 100644
> >> --- a/lib/tests/meson.build
> >> +++ b/lib/tests/meson.build
> >> @@ -8,6 +8,7 @@ lib_tests = [
> >> 'igt_dynamic_subtests',
> >> 'igt_edid',
> >> 'igt_exit_handler',
> >> + 'igt_facts',
> >> 'igt_fork',
> >> 'igt_fork_helper',
> >> 'igt_hook',
> >> diff --git a/runner/executor.c b/runner/executor.c
> >> index ac73e1dde..d1eca3c05 100644
> >> --- a/runner/executor.c
> >> +++ b/runner/executor.c
> >> @@ -30,6 +30,7 @@
> >>
> >> #include "igt_aux.h"
> >> #include "igt_core.h"
> >> +#include "igt_facts.h"
> >> #include "igt_taints.h"
> >> #include "igt_vec.h"
> >> #include "executor.h"
> >> @@ -2306,6 +2307,9 @@ bool execute(struct execute_state *state,
> >> sigset_t sigmask;
> >> double time_spent = 0.0;
> >> bool status = true;
> >> + char *last_test = NULL;
> >> +
> >> + igt_facts_lists_init();
> >>
> >> if (state->dry) {
> >> outf("Dry run, not executing. Invoke igt_resume if you want to execute.\n");
> >> @@ -2438,6 +2442,10 @@ bool execute(struct execute_state *state,
> >> int result;
> >> bool already_written = false;
> >>
> >> + /* Calls before running each test */
> >> + igt_facts(last_test);
> >> + last_test = entry_display_name(&job_list->entries[state->next]);
> >> +
> >> if (should_die_because_signal(sigfd)) {
> >> status = false;
> >> goto end;
> >> @@ -2526,6 +2534,8 @@ bool execute(struct execute_state *state,
> >> return execute(state, settings, job_list);
> >> }
> >> }
> >> + /* Last call to collect facts after the last test runs */
> >> + igt_facts(last_test);
> >>
> >> if ((timefd = openat(resdirfd, "endtime.txt", O_CREAT | O_WRONLY | O_EXCL, 0666)) >= 0) {
> >> dprintf(timefd, "%f\n", timeofday_double());
> >> diff --git a/tools/lsfacts.c b/tools/lsfacts.c
> >> new file mode 100644
> >> index 000000000..10dee0317
> >> --- /dev/null
> >> +++ b/tools/lsfacts.c
> >> @@ -0,0 +1,25 @@
> >> +// SPDX-License-Identifier: MIT
> >> +// Copyright © 2024 Intel Corporation
> >> +
> >> +#include "igt.h"
> >> +#include "igt_facts.h"
> >> +
> >> +/**
> >> + * SECTION:lsfacts
> >> + * @short_description: lsfacts
> >> + * @title: lsfacts
> >> + * @include: lsfacts.c
> >> + *
> >> + * # lsfacts
> >> + *
> >> + * Scan for igt-facts and print them on screen. Indicate if no facts are found.
> >> + */
> >> +int main(int argc, char *argv[])
> >> +{
> >> + igt_facts_lists_init();
> >> +
> >> + igt_facts("lsfacts");
> >> +
> >> + if (igt_facts_are_all_lists_empty())
> >> + igt_info("No facts found...\n");
> >> +}
> >> diff --git a/tools/meson.build b/tools/meson.build
> >> index 48c9a4b50..ff1b0ef90 100644
> >> --- a/tools/meson.build
> >> +++ b/tools/meson.build
> >> @@ -42,6 +42,7 @@ tools_progs = [
> >> 'intel_gem_info',
> >> 'intel_gvtg_test',
> >> 'dpcd_reg',
> >> + 'lsfacts',
> >> 'lsgpu',
> >> 'power',
> >> ]
> >>
> >
> >
> >
> >
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking
2024-12-09 9:17 ` Janusz Krzysztofik
@ 2024-12-09 11:06 ` Peter Senna Tschudin
2024-12-09 13:50 ` Janusz Krzysztofik
0 siblings, 1 reply; 13+ messages in thread
From: Peter Senna Tschudin @ 2024-12-09 11:06 UTC (permalink / raw)
To: Janusz Krzysztofik, igt-dev@lists.freedesktop.org
Cc: Ryszard Knop, Zbigniew Kempczyński, Lucas De Marchi,
luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray,
dominik.karol.piatkowski, katarzyna.piecielska
Hi Janusz,
On 09.12.2024 10:17, Janusz Krzysztofik wrote:
> On Friday, 6 December 2024 06:45:31 CET Peter Senna Tschudin wrote:
>> Hi Janusz,
>>
>> Thank you for your detailed comments. I appreciate the opportunity
>> to clarify and address your concerns.
>>
>> On 05.12.2024 15:05, Janusz Krzysztofik wrote:
>>> Hi Peter,
>>>
>>> On Wednesday, 4 December 2024 19:44:53 CET Peter Senna Tschudin wrote:
>>>> When using igt-runner, collect facts before each test and after the
>>>> last test, and report when facts change. The facts are:
>>>> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20)
>>>> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1
>>>> - Kernel taints: kernel.is_tainted.taint_warn: true
>>>> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true
>>>>
>>>> This change imposes little execution overhead and adds just a few
>>>> lines of logging. The facts will be printed on normal igt-runner
>>>> output. Here is a real example from our CI shwoing
>>>> hotreplug-lateclose changing the DRM card number
>>>
>>> Since you give that as an example of how helpful your facts can be, and follow
>>> that with a kernel taint example, that may indicate you think, and users of
>>> your facts may then be mislead having that read, that the taint was related to
>>> the change of card number, while both had nothing to do with each other.
>>
>> Let’s take a step back to define the purpose and scope of igt-facts:
>> - Definition of a fact from the dictionary: A fact is an objectively verifiable
>> piece of information.
>> - Purpose of igt-facts: Track which tests cause changes to the facts.
>>
>> The operation is straightforward: facts are collected before and after each test,
>> and any differences are logged. Here’s an example showing a fact change and a new
>> fact after running hotreplug-lateclose:
>>
>> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1
>> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true
>>
>> This output highlights the facts without implying causation between them. The
>> tool(and my commit message) neither explains relationships between facts nor
>> misleads users into assuming causation.
>
> For me your commit message does.
Can you please provide some external support for your claims?
I am very careful while writing, and I am aware of my own limitations.
After reading what chatGPT wrote about my text, I felt comfortable
about not misleading anyone. Please mind the wording of my
question: I did not hint a desired outcome for chatGPT.
My Question:
-- // --
Can you please indicate any potential causality relationship
that the following text may offer?
Here is a real example from our CI shwoing hotreplug-lateclose
changing the DRM card number and tainting the kernel on the
abort path
-- // --
chatGPT answer:
--//--
The text suggests a potential causality relationship between the
"hotreplug-lateclose" operation and its effects on the system:
1. Causal Event: The "hotreplug-lateclose" operation.
This likely refers to a process in the CI(Continuous Integration)
environment related to hotplugging devices and managing late-close
scenarios. Hotplugging involves dynamically adding or removing
hardware components without shutting down the system.
2. Effects:
* Changing the DRM card number: This implies that the operation
affects the enumeration or reordering of DRM (Direct Rendering
Manager) devices, causing the card number to be reassigned.
* Kernel Tainting on the abort path: Kernel tainting typically
indicates that the kernel has entered a state that is not
guaranteed to be stable or reliable. This could occur due
to improper handling of errors or unexpected conditions
during the "abort path" (likely a failure or cleanup process
in the operation).
Suggested Causal Chain:
1. The "hotreplug-lateclose" operation executes in the CI.
2. This causes a change in DRM card enumeration (potentially due
to how devices are initialized or re-initialized).
3. On the abort path of this operation, a condition arises that
taints the kernel (possibly due to resource cleanup issues,
unhandled errors, or driver misbehavior).
This text highlights a possible link between hotreplug-lateclose
handling and instabilities in DRM device management as well as
kernel state integrity.
--//--
>
> Can you please provide a full list of "facts" your code is supposed to handle?
This is in the commit message already, at the very begining.
> Can you please explain why you selected just those "facts", not others?
It was either what was missing, such as a convenient way to learn when
something strange happend as a gpu disappearing from the PCI bus, or
something that we believe may cause errors downstream such as a taint,
and the list of loaded modules.
For the drm card number association, we belive that there may be a caching
issue: we are trying to figure it out if the drm-reopen cache handles the
change of drm number association well. We have weak information pointing
to a probable problem akin to missing cache invalidation.
Thanks!
[...]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking
2024-12-09 11:06 ` Peter Senna Tschudin
@ 2024-12-09 13:50 ` Janusz Krzysztofik
2024-12-10 8:38 ` Peter Senna Tschudin
0 siblings, 1 reply; 13+ messages in thread
From: Janusz Krzysztofik @ 2024-12-09 13:50 UTC (permalink / raw)
To: igt-dev@lists.freedesktop.org, Peter Senna Tschudin
Cc: Ryszard Knop, Zbigniew Kempczyński, Lucas De Marchi,
luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray,
dominik.karol.piatkowski, katarzyna.piecielska
On Monday, 9 December 2024 12:06:33 CET Peter Senna Tschudin wrote:
> Hi Janusz,
>
> On 09.12.2024 10:17, Janusz Krzysztofik wrote:
> > On Friday, 6 December 2024 06:45:31 CET Peter Senna Tschudin wrote:
> >> Hi Janusz,
> >>
> >> Thank you for your detailed comments. I appreciate the opportunity
> >> to clarify and address your concerns.
> >>
> >> On 05.12.2024 15:05, Janusz Krzysztofik wrote:
> >>> Hi Peter,
> >>>
> >>> On Wednesday, 4 December 2024 19:44:53 CET Peter Senna Tschudin wrote:
> >>>> When using igt-runner, collect facts before each test and after the
> >>>> last test, and report when facts change. The facts are:
> >>>> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20)
> >>>> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1
> >>>> - Kernel taints: kernel.is_tainted.taint_warn: true
> >>>> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true
> >>>>
> >>>> This change imposes little execution overhead and adds just a few
> >>>> lines of logging. The facts will be printed on normal igt-runner
> >>>> output. Here is a real example from our CI shwoing
> >>>> hotreplug-lateclose changing the DRM card number
> >>>
> >>> Since you give that as an example of how helpful your facts can be, and follow
> >>> that with a kernel taint example, that may indicate you think, and users of
> >>> your facts may then be mislead having that read, that the taint was related to
> >>> the change of card number, while both had nothing to do with each other.
> >>
> >> Let’s take a step back to define the purpose and scope of igt-facts:
> >> - Definition of a fact from the dictionary: A fact is an objectively verifiable
> >> piece of information.
> >> - Purpose of igt-facts: Track which tests cause changes to the facts.
> >>
> >> The operation is straightforward: facts are collected before and after each test,
> >> and any differences are logged. Here’s an example showing a fact change and a new
> >> fact after running hotreplug-lateclose:
> >>
> >> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1
> >> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true
> >>
> >> This output highlights the facts without implying causation between them. The
> >> tool(and my commit message) neither explains relationships between facts nor
> >> misleads users into assuming causation.
> >
> > For me your commit message does.
>
> Can you please provide some external support for your claims?
>
> I am very careful while writing, and I am aware of my own limitations.
> After reading what chatGPT wrote about my text, I felt comfortable
> about not misleading anyone. Please mind the wording of my
> question: I did not hint a desired outcome for chatGPT.
>
> My Question:
> -- // --
> Can you please indicate any potential causality relationship
> that the following text may offer?
>
> Here is a real example from our CI shwoing hotreplug-lateclose
> changing the DRM card number and tainting the kernel on the
> abort path
> -- // --
>
> chatGPT answer:
> --//--
> The text suggests a potential causality relationship between the
> "hotreplug-lateclose" operation and its effects on the system:
>
> 1. Causal Event: The "hotreplug-lateclose" operation.
> This likely refers to a process in the CI(Continuous Integration)
> environment related to hotplugging devices and managing late-close
> scenarios. Hotplugging involves dynamically adding or removing
> hardware components without shutting down the system.
>
> 2. Effects:
>
> * Changing the DRM card number: This implies that the operation
> affects the enumeration or reordering of DRM (Direct Rendering
> Manager) devices, causing the card number to be reassigned.
>
> * Kernel Tainting on the abort path: Kernel tainting typically
> indicates that the kernel has entered a state that is not
> guaranteed to be stable or reliable. This could occur due
> to improper handling of errors or unexpected conditions
> during the "abort path" (likely a failure or cleanup process
> in the operation).
>
> Suggested Causal Chain:
> 1. The "hotreplug-lateclose" operation executes in the CI.
> 2. This causes a change in DRM card enumeration (potentially due
> to how devices are initialized or re-initialized).
> 3. On the abort path of this operation, a condition arises that
> taints the kernel (possibly due to resource cleanup issues,
> unhandled errors, or driver misbehavior).
>
> This text highlights a possible link between hotreplug-lateclose
> handling and instabilities in DRM device management as well as
> kernel state integrity.
> --//--
>
>
>
> >
> > Can you please provide a full list of "facts" your code is supposed to handle?
>
> This is in the commit message already, at the very begining.
>
> > Can you please explain why you selected just those "facts", not others?
>
> It was either what was missing, such as a convenient way to learn when
> something strange happend as a gpu disappearing from the PCI bus, or
> something that we believe may cause errors downstream such as a taint,
> and the list of loaded modules.
>
> For the drm card number association, we belive that there may be a caching
> issue: we are trying to figure it out if the drm-reopen cache handles the
> change of drm number association well. We have weak information pointing
> to a probable problem akin to missing cache invalidation.
Let's have a look at CI reports from igt@core_hotunplug@hotre* subtests that
resulted in kernel taint:
https://gfx-ci.igk.intel.com/cibuglog-ng/results/all?query=test_name%20~=%20%22^igt@core_hotunplug@hotre%22%20AND%20runconfig_tag%20CONTAINS%20%22xe%22%20AND%20status_name%20NOT%20IN%20[%22notrun%22,%22pass%22,%22skip%22]%20AND%20dmesg%20~=%20%22[tT]aint%22
Here are three cases from the top of the list at the time of this writing:
1.
https://intel-gfx-ci.01.org/tree/intel-xe/IGT_8142/shard-bmg-5/igt@core_hotunplug@hotrebind-lateclose.html
stdout:
...
Opened device: /dev/dri/card0
...
Opened device: /dev/dri/renderD128
...
Opened device: /dev/dri/card1
Opened device: /dev/dri/renderD129
dmesg:
...
<6> [339.587085] [IGT] core_hotunplug: starting subtest hotrebind-lateclose
...
<4> [340.531229] WARNING: CPU: 8 PID: 45 at drivers/gpu/drm/i915/display/intel_display_power.c:580 __intel_display_power_put_domain+0x1c9/0x200 [xe]
...
<4> [340.532206] CPU: 8 UID: 0 PID: 45 Comm: kworker/8:0 Tainted: G W 6.13.0-rc1-xe+ #1
<4> [340.532221] Tainted: [W]=WARN
...
<7> [340.617993] [IGT] core_hotunplug: opening DRM device for hot rebind
...
<4> [340.929113] WARNING: CPU: 2 PID: 4067 at drivers/gpu/drm/i915/display/intel_display_power.c:580 __intel_display_power_put_domain+0x1c9/0x200 [xe]
...
<4> [340.930040] CPU: 2 UID: 0 PID: 4067 Comm: core_hotunplug Tainted: G W 6.13.0-rc1-xe+ #1
<4> [340.930054] Tainted: [W]=WARN
...
<4> [341.059764] WARNING: CPU: 2 PID: 4067 at drivers/gpu/drm/i915/display/intel_display_power.c:580 __intel_display_power_put_domain+0x1c9/0x200 [xe]
...
<4> [341.059995] CPU: 2 UID: 0 PID: 4067 Comm: core_hotunplug Tainted: G W 6.13.0-rc1-xe+ #1
<4> [341.059999] Tainted: [W]=WARN
...
<7> [341.348551] [IGT] core_hotunplug: rebinding the driver to the device
...
<6> [341.614815] [drm] Initialized xe 1.1.0 for 0000:03:00.0 on minor 1
...
2.
https://intel-gfx-ci.01.org/tree/intel-xe/xe-2331-e57b4b7cd137e0ae00d2601b4b55ab7f504da422/shard-bmg-8/igt@core_hotunplug@hotrebind-lateclose.html
stdout:
...
Opened device: /dev/dri/card1
...
Opened device: /dev/dri/renderD129
...
Opened device: /dev/dri/card0
Opened device: /dev/dri/renderD128
...
dmesg:
...
<7> [483.134426] [IGT] core_hotunplug: opening DRM device for hot rebind
...
<4> [483.441514] WARNING: CPU: 8 PID: 5465 at drivers/gpu/drm/i915/display/intel_display_power.c:580 __intel_display_power_put_domain+0x1c9/0x200 [xe]
...
<4> [483.442441] CPU: 8 UID: 0 PID: 5465 Comm: core_hotunplug Tainted: G W 6.13.0-rc1-xe+ #1
<4> [483.442456] Tainted: [W]=WARN
...
<4> [483.586987] WARNING: CPU: 0 PID: 5465 at drivers/gpu/drm/i915/display/intel_display_power.c:580 __intel_display_power_put_domain+0x1c9/0x200 [xe]
...
<4> [483.587210] CPU: 0 UID: 0 PID: 5465 Comm: core_hotunplug Tainted: G W 6.13.0-rc1-xe+ #1
<4> [483.587214] Tainted: [W]=WARN
...
<7> [484.026293] [IGT] core_hotunplug: rebinding the driver to the device
...
<6> [484.291135] [drm] Initialized xe 1.1.0 for 0000:03:00.0 on minor 0
...
3.
https://intel-gfx-ci.01.org/tree/intel-xe/xe-2330-784dd0b20e39add60971ccdd5d2f7f3c27cf37bb/shard-bmg-4/igt@core_hotunplug@hotrebind-lateclose.html
stdout:
...
Opened device: /dev/dri/card1
...
Opened device: /dev/dri/renderD129
...
Opened device: /dev/dri/card0
Opened device: /dev/dri/renderD128
...
dmesg:
...
<6> [483.536630] [IGT] core_hotunplug: starting subtest hotrebind-lateclose
...
<4> [484.402689] WARNING: CPU: 4 PID: 4167 at drivers/gpu/drm/i915/display/intel_display_power.c:580 __intel_display_power_put_domain+0x1c9/0x200 [xe]
...
<4> [484.403663] CPU: 4 UID: 0 PID: 4167 Comm: kworker/4:7 Tainted: G W 6.13.0-rc1-xe+ #1
<4> [484.403677] Tainted: [W]=WARN
...
<7> [484.563261] [IGT] core_hotunplug: closing health checked device instance
...
<4> [484.641694] WARNING: CPU: 9 PID: 4887 at drivers/gpu/drm/i915/display/intel_display.c:416 intel_disable_transcoder+0x322/0x400 [xe]
...
<4> [484.642621] CPU: 9 UID: 0 PID: 4887 Comm: kworker/9:4 Tainted: G W 6.13.0-rc1-xe+ #1
<4> [484.642635] Tainted: [W]=WARN
...
<7> [484.725550] [IGT] core_hotunplug: closing health checked device instance
...
<4> [484.873571] WARNING: CPU: 9 PID: 4887 at drivers/gpu/drm/i915/display/intel_display.c:416 intel_disable_transcoder+0x322/0x400 [xe]
...
<4> [484.874656] CPU: 9 UID: 0 PID: 4887 Comm: kworker/9:4 Tainted: G W 6.13.0-rc1-xe+ #1
<4> [484.874672] Tainted: [W]=WARN
...
<7> [484.959255] [IGT] core_hotunplug: opening DRM device for hot rebind
...
<4> [485.126668] WARNING: CPU: 9 PID: 4887 at drivers/gpu/drm/i915/display/intel_display.c:416 intel_disable_transcoder+0x322/0x400 [xe]
...
<4> [485.127401] CPU: 9 UID: 0 PID: 4887 Comm: kworker/9:4 Tainted: G W 6.13.0-rc1-xe+ #1
<4> [485.127413] Tainted: [W]=WARN
...
<4> [485.309106] WARNING: CPU: 0 PID: 5276 at drivers/gpu/drm/i915/display/intel_display_power.c:580 __intel_display_power_put_domain+0x1c9/0x200 [xe]
...
<4> [485.309290] CPU: 0 UID: 0 PID: 5276 Comm: core_hotunplug Tainted: G W 6.13.0-rc1-xe+ #1
<4> [485.309293] Tainted: [W]=WARN
...
...
<7> [485.941797] [IGT] core_hotunplug: rebinding the driver to the device
...
<6> [486.209963] [drm] Initialized xe 1.1.0 for 0000:03:00.0 on minor 0
...
In all the above reports I can see information on changed device minor number
and kernel tainted.
Now let's try to find an abort with no relevant information in dmesg (i.e.,
neither explicit info about kernel taint, nor BUG nor WARNING kernel messages
-- events that mark the kernel tainted):
https://gfx-ci.igk.intel.com/cibuglog-ng/results/all?query=test_name+~%3D+%22^igt%40core_hotunplug%40hotre%22+AND+status_name+IS+IN+[%22abort%22]+AND+NOT+dmesg+~%3D+%22[tT]aint|BUG|WARNING%22
I can't find any, can you?
Let's also try to find reports from executions where the driver was
succesfully rebound but information about potentially changed device minor
number was not provided:
https://gfx-ci.igk.intel.com/cibuglog-ng/results/all?query=test_name%20~=%20%22^igt@core_hotunplug@hotre%22%20AND%20status_name%20NOT%20IN%20[%22notrun%22,%22pass%22,%22skip%22]%20AND%20dmesg%20~=%20%22opening%20DRM%20device%20for%20hot(.*\n)*.*\[drm\]%20Initialized(.*\n)*.*opening%20DRM%20device%20for%20health%20check%22%20AND%20NOT%20stdout%20~=%20%22\nOpened%20device:%20/dev/dri/renderD12[0-9]\n(.*\n)*Opened%20device:%20/dev/dri/card[0-9]\n%22
I can see only cases where IGT runner didn't include such information from
stdout in the report because it killed the test after timeout and stopped
tracing its stdout before the test attempted to rebind the driver and had a
chance to print those messages to stdout.
*What value do your facts add to CI reporting then?*
Thanks,
Janusz
>
>
> Thanks!
>
> [...]
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking
2024-12-09 13:50 ` Janusz Krzysztofik
@ 2024-12-10 8:38 ` Peter Senna Tschudin
2024-12-10 9:50 ` Janusz Krzysztofik
0 siblings, 1 reply; 13+ messages in thread
From: Peter Senna Tschudin @ 2024-12-10 8:38 UTC (permalink / raw)
To: Janusz Krzysztofik, igt-dev@lists.freedesktop.org
Cc: Ryszard Knop, Zbigniew Kempczyński, Lucas De Marchi,
luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray,
dominik.karol.piatkowski, katarzyna.piecielska
Very good morning Janusz,
Thank you for your detailed question.
On 09.12.2024 14:50, Janusz Krzysztofik wrote:
> On Monday, 9 December 2024 12:06:33 CET Peter Senna Tschudin wrote:
>> Hi Janusz,
>>
>> On 09.12.2024 10:17, Janusz Krzysztofik wrote:
>>> On Friday, 6 December 2024 06:45:31 CET Peter Senna Tschudin wrote:
>>>> Hi Janusz,
>>>>
>>>> Thank you for your detailed comments. I appreciate the opportunity
>>>> to clarify and address your concerns.
>>>>
>>>> On 05.12.2024 15:05, Janusz Krzysztofik wrote:
>>>>> Hi Peter,
>>>>>
>>>>> On Wednesday, 4 December 2024 19:44:53 CET Peter Senna Tschudin wrote:
>>>>>> When using igt-runner, collect facts before each test and after the
>>>>>> last test, and report when facts change. The facts are:
>>>>>> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20)
>>>>>> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1
>>>>>> - Kernel taints: kernel.is_tainted.taint_warn: true
>>>>>> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true
>>>>>>
>>>>>> This change imposes little execution overhead and adds just a few
>>>>>> lines of logging. The facts will be printed on normal igt-runner
>>>>>> output. Here is a real example from our CI shwoing
>>>>>> hotreplug-lateclose changing the DRM card number
>>>>>
>>>>> Since you give that as an example of how helpful your facts can be, and follow
>>>>> that with a kernel taint example, that may indicate you think, and users of
>>>>> your facts may then be mislead having that read, that the taint was related to
>>>>> the change of card number, while both had nothing to do with each other.
>>>>
>>>> Let’s take a step back to define the purpose and scope of igt-facts:
>>>> - Definition of a fact from the dictionary: A fact is an objectively verifiable
>>>> piece of information.
>>>> - Purpose of igt-facts: Track which tests cause changes to the facts.
>>>>
>>>> The operation is straightforward: facts are collected before and after each test,
>>>> and any differences are logged. Here’s an example showing a fact change and a new
>>>> fact after running hotreplug-lateclose:
>>>>
>>>> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1
>>>> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true
>>>>
>>>> This output highlights the facts without implying causation between them. The
>>>> tool(and my commit message) neither explains relationships between facts nor
>>>> misleads users into assuming causation.
>>>
>>> For me your commit message does.
>>
>> Can you please provide some external support for your claims?
>>
>> I am very careful while writing, and I am aware of my own limitations.
>> After reading what chatGPT wrote about my text, I felt comfortable
>> about not misleading anyone. Please mind the wording of my
>> question: I did not hint a desired outcome for chatGPT.
>>
>> My Question:
>> -- // --
>> Can you please indicate any potential causality relationship
>> that the following text may offer?
>>
>> Here is a real example from our CI shwoing hotreplug-lateclose
>> changing the DRM card number and tainting the kernel on the
>> abort path
>> -- // --
>>
>> chatGPT answer:
>> --//--
>> The text suggests a potential causality relationship between the
>> "hotreplug-lateclose" operation and its effects on the system:
>>
>> 1. Causal Event: The "hotreplug-lateclose" operation.
>> This likely refers to a process in the CI(Continuous Integration)
>> environment related to hotplugging devices and managing late-close
>> scenarios. Hotplugging involves dynamically adding or removing
>> hardware components without shutting down the system.
>>
>> 2. Effects:
>>
>> * Changing the DRM card number: This implies that the operation
>> affects the enumeration or reordering of DRM (Direct Rendering
>> Manager) devices, causing the card number to be reassigned.
>>
>> * Kernel Tainting on the abort path: Kernel tainting typically
>> indicates that the kernel has entered a state that is not
>> guaranteed to be stable or reliable. This could occur due
>> to improper handling of errors or unexpected conditions
>> during the "abort path" (likely a failure or cleanup process
>> in the operation).
>>
>> Suggested Causal Chain:
>> 1. The "hotreplug-lateclose" operation executes in the CI.
>> 2. This causes a change in DRM card enumeration (potentially due
>> to how devices are initialized or re-initialized).
>> 3. On the abort path of this operation, a condition arises that
>> taints the kernel (possibly due to resource cleanup issues,
>> unhandled errors, or driver misbehavior).
>>
>> This text highlights a possible link between hotreplug-lateclose
>> handling and instabilities in DRM device management as well as
>> kernel state integrity.
>> --//--
>>
>>
>>
>>>
>>> Can you please provide a full list of "facts" your code is supposed to handle?
>>
>> This is in the commit message already, at the very begining.
>>
>>> Can you please explain why you selected just those "facts", not others?
>>
>> It was either what was missing, such as a convenient way to learn when
>> something strange happend as a gpu disappearing from the PCI bus, or
>> something that we believe may cause errors downstream such as a taint,
>> and the list of loaded modules.
>>
>> For the drm card number association, we belive that there may be a caching
>> issue: we are trying to figure it out if the drm-reopen cache handles the
>> change of drm number association well. We have weak information pointing
>> to a probable problem akin to missing cache invalidation.
[...]
> *What value do your facts add to CI reporting then?*
It seems that the focus of your feedback has shifted from 'Peter is
misleading people' to questioning the value of igt-runner. I believe
it's important to address this change in perspective to ensure
clarity and productive discussion.
To address your new point, let me draw an analogy: asking "what is
the value of igt-runner?" is similar to questioning the use of a car
to reach the supermarket when walking is an option. While walking may
suffice, the car adds efficiency and convenience, especially for
larger or more complex "tasks".
In our CI setup, a typical test list includes between 100 and
200 tests. As you rightly pointed out, determining which test changed
a specific fact currently requires significant time and effort. When
multiple tests modify different facts, the process is not practical.
igt-facts adds value by:
- Clearly identifying which tests change which facts.
- Enabling us to know which facts were present and absent before the
first test runs.
- Reporting only changes, making the information concise and
efficient to review.
Moreover, for most developers, the igt-facts output will not be
obtrusive. In our CI runs, the output is saved only to the
igt_runner?? file and does not appear in dmesg or the results.json
file.
And even so, in the few hundred runs that I checked, igt-facts
typically added between 3 and 16 lines of log to a file that has
about 1000 lines.
I hope this explanation helps clarify the purpose and value of
igt-facts.
Thank you,
Peter
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking
2024-12-10 8:38 ` Peter Senna Tschudin
@ 2024-12-10 9:50 ` Janusz Krzysztofik
2024-12-10 13:41 ` Knop, Ryszard
0 siblings, 1 reply; 13+ messages in thread
From: Janusz Krzysztofik @ 2024-12-10 9:50 UTC (permalink / raw)
To: igt-dev@lists.freedesktop.org, Peter Senna Tschudin
Cc: Ryszard Knop, Zbigniew Kempczyński, Lucas De Marchi,
luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray,
dominik.karol.piatkowski, katarzyna.piecielska
Hi Peter,
On Tuesday, 10 December 2024 09:38:59 CET Peter Senna Tschudin wrote:
> Very good morning Janusz,
>
> Thank you for your detailed question.
>
> On 09.12.2024 14:50, Janusz Krzysztofik wrote:
> > On Monday, 9 December 2024 12:06:33 CET Peter Senna Tschudin wrote:
> >> Hi Janusz,
> >>
> >> On 09.12.2024 10:17, Janusz Krzysztofik wrote:
> >>> On Friday, 6 December 2024 06:45:31 CET Peter Senna Tschudin wrote:
> >>>> Hi Janusz,
> >>>>
> >>>> Thank you for your detailed comments. I appreciate the opportunity
> >>>> to clarify and address your concerns.
> >>>>
> >>>> On 05.12.2024 15:05, Janusz Krzysztofik wrote:
> >>>>> Hi Peter,
> >>>>>
> >>>>> On Wednesday, 4 December 2024 19:44:53 CET Peter Senna Tschudin wrote:
> >>>>>> When using igt-runner, collect facts before each test and after the
> >>>>>> last test, and report when facts change. The facts are:
> >>>>>> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20)
> >>>>>> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1
> >>>>>> - Kernel taints: kernel.is_tainted.taint_warn: true
> >>>>>> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true
> >>>>>>
> >>>>>> This change imposes little execution overhead and adds just a few
> >>>>>> lines of logging. The facts will be printed on normal igt-runner
> >>>>>> output. Here is a real example from our CI shwoing
> >>>>>> hotreplug-lateclose changing the DRM card number
> >>>>>
> >>>>> Since you give that as an example of how helpful your facts can be, and follow
> >>>>> that with a kernel taint example, that may indicate you think, and users of
> >>>>> your facts may then be mislead having that read, that the taint was related to
> >>>>> the change of card number, while both had nothing to do with each other.
> >>>>
> >>>> Let’s take a step back to define the purpose and scope of igt-facts:
> >>>> - Definition of a fact from the dictionary: A fact is an objectively verifiable
> >>>> piece of information.
> >>>> - Purpose of igt-facts: Track which tests cause changes to the facts.
> >>>>
> >>>> The operation is straightforward: facts are collected before and after each test,
> >>>> and any differences are logged. Here’s an example showing a fact change and a new
> >>>> fact after running hotreplug-lateclose:
> >>>>
> >>>> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1
> >>>> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true
> >>>>
> >>>> This output highlights the facts without implying causation between them. The
> >>>> tool(and my commit message) neither explains relationships between facts nor
> >>>> misleads users into assuming causation.
> >>>
> >>> For me your commit message does.
> >>
> >> Can you please provide some external support for your claims?
> >>
> >> I am very careful while writing, and I am aware of my own limitations.
> >> After reading what chatGPT wrote about my text, I felt comfortable
> >> about not misleading anyone. Please mind the wording of my
> >> question: I did not hint a desired outcome for chatGPT.
> >>
> >> My Question:
> >> -- // --
> >> Can you please indicate any potential causality relationship
> >> that the following text may offer?
> >>
> >> Here is a real example from our CI shwoing hotreplug-lateclose
> >> changing the DRM card number and tainting the kernel on the
> >> abort path
> >> -- // --
> >>
> >> chatGPT answer:
> >> --//--
> >> The text suggests a potential causality relationship between the
> >> "hotreplug-lateclose" operation and its effects on the system:
> >>
> >> 1. Causal Event: The "hotreplug-lateclose" operation.
> >> This likely refers to a process in the CI(Continuous Integration)
> >> environment related to hotplugging devices and managing late-close
> >> scenarios. Hotplugging involves dynamically adding or removing
> >> hardware components without shutting down the system.
> >>
> >> 2. Effects:
> >>
> >> * Changing the DRM card number: This implies that the operation
> >> affects the enumeration or reordering of DRM (Direct Rendering
> >> Manager) devices, causing the card number to be reassigned.
> >>
> >> * Kernel Tainting on the abort path: Kernel tainting typically
> >> indicates that the kernel has entered a state that is not
> >> guaranteed to be stable or reliable. This could occur due
> >> to improper handling of errors or unexpected conditions
> >> during the "abort path" (likely a failure or cleanup process
> >> in the operation).
> >>
> >> Suggested Causal Chain:
> >> 1. The "hotreplug-lateclose" operation executes in the CI.
> >> 2. This causes a change in DRM card enumeration (potentially due
> >> to how devices are initialized or re-initialized).
> >> 3. On the abort path of this operation, a condition arises that
> >> taints the kernel (possibly due to resource cleanup issues,
> >> unhandled errors, or driver misbehavior).
> >>
> >> This text highlights a possible link between hotreplug-lateclose
> >> handling and instabilities in DRM device management as well as
> >> kernel state integrity.
> >> --//--
> >>
> >>
> >>
> >>>
> >>> Can you please provide a full list of "facts" your code is supposed to handle?
> >>
> >> This is in the commit message already, at the very begining.
> >>
> >>> Can you please explain why you selected just those "facts", not others?
> >>
> >> It was either what was missing, such as a convenient way to learn when
> >> something strange happend as a gpu disappearing from the PCI bus, or
> >> something that we believe may cause errors downstream such as a taint,
> >> and the list of loaded modules.
> >>
> >> For the drm card number association, we belive that there may be a caching
> >> issue: we are trying to figure it out if the drm-reopen cache handles the
> >> change of drm number association well. We have weak information pointing
> >> to a probable problem akin to missing cache invalidation.
>
> [...]
> > *What value do your facts add to CI reporting then?*
>
> It seems that the focus of your feedback has shifted from 'Peter is
> misleading people' to questioning the value of igt-runner. I believe
> it's important to address this change in perspective to ensure
> clarity and productive discussion.
>
> To address your new point, let me draw an analogy: asking "what is
> the value of igt-runner?" is similar to questioning the use of a car
> to reach the supermarket when walking is an option. While walking may
> suffice, the car adds efficiency and convenience, especially for
> larger or more complex "tasks".
>
> In our CI setup, a typical test list includes between 100 and
> 200 tests. As you rightly pointed out, determining which test changed
> a specific fact currently requires significant time and effort. When
> multiple tests modify different facts, the process is not practical.
>
> igt-facts adds value by:
> - Clearly identifying which tests change which facts.
> - Enabling us to know which facts were present and absent before the
> first test runs.
> - Reporting only changes, making the information concise and
> efficient to review.
>
> Moreover, for most developers, the igt-facts output will not be
> obtrusive. In our CI runs, the output is saved only to the
> igt_runner?? file and does not appear in dmesg or the results.json
> file.
>
> And even so, in the few hundred runs that I checked, igt-facts
> typically added between 3 and 16 lines of log to a file that has
> about 1000 lines.
>
> I hope this explanation helps clarify the purpose and value of
> igt-facts.
Unfortunately not. I still can't see how your 4 facts reported by igt_runner
could be useful to me. But let's hear from those who share your points.
Thanks,
Janusz
>
> Thank you,
>
> Peter
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking
2024-12-10 9:50 ` Janusz Krzysztofik
@ 2024-12-10 13:41 ` Knop, Ryszard
0 siblings, 0 replies; 13+ messages in thread
From: Knop, Ryszard @ 2024-12-10 13:41 UTC (permalink / raw)
To: igt-dev@lists.freedesktop.org, peter.senna@linux.intel.com,
janusz.krzysztofik@linux.intel.com
Cc: Das, Nirmoy, Coelho, Luciano, Piecielska, Katarzyna,
Kempczynski, Zbigniew, Piatkowski, Dominik Karol, Summers, Stuart,
Ghimiray, Himal Prasad, De Marchi, Lucas
On Tue, 2024-12-10 at 10:50 +0100, Janusz Krzysztofik wrote:
> Hi Peter,
>
> On Tuesday, 10 December 2024 09:38:59 CET Peter Senna Tschudin wrote:
> > Very good morning Janusz,
> >
> > Thank you for your detailed question.
> >
> > On 09.12.2024 14:50, Janusz Krzysztofik wrote:
> > > On Monday, 9 December 2024 12:06:33 CET Peter Senna Tschudin wrote:
> > > > Hi Janusz,
> > > >
> > > > On 09.12.2024 10:17, Janusz Krzysztofik wrote:
> > > > > On Friday, 6 December 2024 06:45:31 CET Peter Senna Tschudin wrote:
> > > > > > Hi Janusz,
> > > > > >
> > > > > > Thank you for your detailed comments. I appreciate the opportunity
> > > > > > to clarify and address your concerns.
> > > > > >
> > > > > > On 05.12.2024 15:05, Janusz Krzysztofik wrote:
> > > > > > > Hi Peter,
> > > > > > >
> > > > > > > On Wednesday, 4 December 2024 19:44:53 CET Peter Senna Tschudin wrote:
> > > > > > > > When using igt-runner, collect facts before each test and after the
> > > > > > > > last test, and report when facts change. The facts are:
> > > > > > > > - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20)
> > > > > > > > - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1
> > > > > > > > - Kernel taints: kernel.is_tainted.taint_warn: true
> > > > > > > > - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true
> > > > > > > >
> > > > > > > > This change imposes little execution overhead and adds just a few
> > > > > > > > lines of logging. The facts will be printed on normal igt-runner
> > > > > > > > output. Here is a real example from our CI shwoing
> > > > > > > > hotreplug-lateclose changing the DRM card number
> > > > > > >
> > > > > > > Since you give that as an example of how helpful your facts can be, and follow
> > > > > > > that with a kernel taint example, that may indicate you think, and users of
> > > > > > > your facts may then be mislead having that read, that the taint was related to
> > > > > > > the change of card number, while both had nothing to do with each other.
> > > > > >
> > > > > > Let’s take a step back to define the purpose and scope of igt-facts:
> > > > > > - Definition of a fact from the dictionary: A fact is an objectively verifiable
> > > > > > piece of information.
> > > > > > - Purpose of igt-facts: Track which tests cause changes to the facts.
> > > > > >
> > > > > > The operation is straightforward: facts are collected before and after each test,
> > > > > > and any differences are logged. Here’s an example showing a fact change and a new
> > > > > > fact after running hotreplug-lateclose:
> > > > > >
> > > > > > [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1
> > > > > > [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true
> > > > > >
> > > > > > This output highlights the facts without implying causation between them. The
> > > > > > tool(and my commit message) neither explains relationships between facts nor
> > > > > > misleads users into assuming causation.
> > > > >
> > > > > For me your commit message does.
> > > >
> > > > Can you please provide some external support for your claims?
> > > >
> > > > I am very careful while writing, and I am aware of my own limitations.
> > > > After reading what chatGPT wrote about my text, I felt comfortable
> > > > about not misleading anyone. Please mind the wording of my
> > > > question: I did not hint a desired outcome for chatGPT.
> > > >
> > > > My Question:
> > > > -- // --
> > > > Can you please indicate any potential causality relationship
> > > > that the following text may offer?
> > > >
> > > > Here is a real example from our CI shwoing hotreplug-lateclose
> > > > changing the DRM card number and tainting the kernel on the
> > > > abort path
> > > > -- // --
> > > >
> > > > chatGPT answer:
> > > > --//--
> > > > The text suggests a potential causality relationship between the
> > > > "hotreplug-lateclose" operation and its effects on the system:
> > > >
> > > > 1. Causal Event: The "hotreplug-lateclose" operation.
> > > > This likely refers to a process in the CI(Continuous Integration)
> > > > environment related to hotplugging devices and managing late-close
> > > > scenarios. Hotplugging involves dynamically adding or removing
> > > > hardware components without shutting down the system.
> > > >
> > > > 2. Effects:
> > > >
> > > > * Changing the DRM card number: This implies that the operation
> > > > affects the enumeration or reordering of DRM (Direct Rendering
> > > > Manager) devices, causing the card number to be reassigned.
> > > >
> > > > * Kernel Tainting on the abort path: Kernel tainting typically
> > > > indicates that the kernel has entered a state that is not
> > > > guaranteed to be stable or reliable. This could occur due
> > > > to improper handling of errors or unexpected conditions
> > > > during the "abort path" (likely a failure or cleanup process
> > > > in the operation).
> > > >
> > > > Suggested Causal Chain:
> > > > 1. The "hotreplug-lateclose" operation executes in the CI.
> > > > 2. This causes a change in DRM card enumeration (potentially due
> > > > to how devices are initialized or re-initialized).
> > > > 3. On the abort path of this operation, a condition arises that
> > > > taints the kernel (possibly due to resource cleanup issues,
> > > > unhandled errors, or driver misbehavior).
> > > >
> > > > This text highlights a possible link between hotreplug-lateclose
> > > > handling and instabilities in DRM device management as well as
> > > > kernel state integrity.
> > > > --//--
(Really not a fan of posting AI slop as an argument for how well other
people can understand your position, especially if you ask it to
perform any sort of reasoning...)
> > > >
> > > >
> > > > >
> > > > > Can you please provide a full list of "facts" your code is supposed to handle?
> > > >
> > > > This is in the commit message already, at the very begining.
> > > >
> > > > > Can you please explain why you selected just those "facts", not others?
> > > >
> > > > It was either what was missing, such as a convenient way to learn when
> > > > something strange happend as a gpu disappearing from the PCI bus, or
> > > > something that we believe may cause errors downstream such as a taint,
> > > > and the list of loaded modules.
> > > >
> > > > For the drm card number association, we belive that there may be a caching
> > > > issue: we are trying to figure it out if the drm-reopen cache handles the
> > > > change of drm number association well. We have weak information pointing
> > > > to a probable problem akin to missing cache invalidation.
> >
> > [...]
> > > *What value do your facts add to CI reporting then?*
> >
> > It seems that the focus of your feedback has shifted from 'Peter is
> > misleading people' to questioning the value of igt-runner. I believe
> > it's important to address this change in perspective to ensure
> > clarity and productive discussion.
> >
> > To address your new point, let me draw an analogy: asking "what is
> > the value of igt-runner?" is similar to questioning the use of a car
> > to reach the supermarket when walking is an option. While walking may
> > suffice, the car adds efficiency and convenience, especially for
> > larger or more complex "tasks".
> >
> > In our CI setup, a typical test list includes between 100 and
> > 200 tests. As you rightly pointed out, determining which test changed
> > a specific fact currently requires significant time and effort. When
> > multiple tests modify different facts, the process is not practical.
> >
> > igt-facts adds value by:
> > - Clearly identifying which tests change which facts.
> > - Enabling us to know which facts were present and absent before the
> > first test runs.
> > - Reporting only changes, making the information concise and
> > efficient to review.
> >
> > Moreover, for most developers, the igt-facts output will not be
> > obtrusive. In our CI runs, the output is saved only to the
> > igt_runner?? file and does not appear in dmesg or the results.json
> > file.
> >
> > And even so, in the few hundred runs that I checked, igt-facts
> > typically added between 3 and 16 lines of log to a file that has
> > about 1000 lines.
> >
> > I hope this explanation helps clarify the purpose and value of
> > igt-facts.
>
> Unfortunately not. I still can't see how your 4 facts reported by igt_runner
> could be useful to me. But let's hear from those who share your points.
Here's 3 reasons why they seem useful to me:
- CI itself performs some of these checks in the pipelines (loaded
modules, kernel taints before testing starts, likely more in the
future), so this could simplify fetching this data if we just parsed
the IGT lsfacts output.
- Simplicity. I can take a quick look into the runner logs, see a bunch
of successful tests, notice one of the tests left the system tainted
and the GPU dropped off the bus - uh-huh, at least I can quickly filter
down the interesting logs to what happened in tests near the reported
fact changes. This is nothing revolutionary that I could not divine
from dmesgs/outs/errs one way or another, but it sure is easy and
convenient, especially for people who are not kernel developers but
have to review and debug this stuff anyways.
- It also makes it easier to notice passing tests that leave the system
in an unclean state - for example, if a display drops off after a
successful test and what follows does not immediately use displays, so
some runs appear healthy, others less so (=> reasoning for noise runs).
So yes, I'd like to see this framework in IGTs. It's not strictly
necessary, but it would make our lives a little easier.
Thanks, Ryszard
> Thanks,
> Janusz
>
> >
> > Thank you,
> >
> > Peter
> >
>
>
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-12-10 13:42 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <5a111c35-7245-4ada-a2a0-3fd0fd5bbeab@linux.intel.com>
2024-12-05 14:05 ` [PATCH i-g-t v10] igt-runner fact checking Janusz Krzysztofik
2024-12-06 5:45 ` Peter Senna Tschudin
2024-12-09 9:17 ` Janusz Krzysztofik
2024-12-09 11:06 ` Peter Senna Tschudin
2024-12-09 13:50 ` Janusz Krzysztofik
2024-12-10 8:38 ` Peter Senna Tschudin
2024-12-10 9:50 ` Janusz Krzysztofik
2024-12-10 13:41 ` Knop, Ryszard
2024-11-02 11:37 [PATCH i-g-t] " Peter Senna Tschudin
2024-12-05 4:54 ` [PATCH i-g-t v10] " Peter Senna Tschudin
2024-12-05 9:08 ` Piatkowski, Dominik Karol
2024-12-06 11:42 ` Kamil Konieczny
2024-12-06 13:16 ` Peter Senna Tschudin
2024-12-06 16:46 ` Kamil Konieczny
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox