* Re: [PATCH i-g-t v10] igt-runner fact checking [not found] <5a111c35-7245-4ada-a2a0-3fd0fd5bbeab@linux.intel.com> @ 2024-12-05 14:05 ` Janusz Krzysztofik 2024-12-06 5:45 ` Peter Senna Tschudin 0 siblings, 1 reply; 13+ messages in thread From: Janusz Krzysztofik @ 2024-12-05 14:05 UTC (permalink / raw) To: igt-dev@lists.freedesktop.org, Peter Senna Tschudin Cc: Ryszard Knop, Zbigniew Kempczyński, Lucas De Marchi, luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray, dominik.karol.piatkowski, katarzyna.piecielska, Janusz Krzysztofik Hi Peter, On Wednesday, 4 December 2024 19:44:53 CET Peter Senna Tschudin wrote: > When using igt-runner, collect facts before each test and after the > last test, and report when facts change. The facts are: > - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20) > - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1 > - Kernel taints: kernel.is_tainted.taint_warn: true > - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true > > This change imposes little execution overhead and adds just a few > lines of logging. The facts will be printed on normal igt-runner > output. Here is a real example from our CI shwoing > hotreplug-lateclose changing the DRM card number Since you give that as an example of how helpful your facts can be, and follow that with a kernel taint example, that may indicate you think, and users of your facts may then be mislead having that read, that the taint was related to the change of card number, while both had nothing to do with each other. Please add something like ', which is expected,' to your description. Changed card number is expected, and that's nothing wrong with that. The old, still open instance of the driver still exists. It is expected to be already decoupled from hardware, but it still occupies its minor device number. Then, new instance of the driver, attached to the same hardware, gets first free minor number. Refresh of IGT device filter before health check can perfectly handle that case. > and tainting the > kernel on the abort path: No, hotreplug-lateclose doesn't taint the kernel. That's wrong conclusion from what was reported, or wrong wording at least. Kernel taint (or lockdep not active) must have been a result of driver issues (or maybe well known limitations) exposed by the test. Then igt_runner took the abort path since it detected those unhealthy conditions. Again, users of your facts may be mislead if your messages can really suggest what you tell us about what you think has happened. The whole idea of testing i915 resistance to hot unplug scenarios came from perceived cases of VFs potentially disappearing from VMs. Some significant effort was put on that feature of the driver, but since the 'hot' test variants were never unblocked in CI, things tended to get worse and worse over time while new driver features that didn't care for hot unplug capability were added. Thanks, Janusz > > [245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose) > [245.383596] Starting subtest: hotreplug-lateclose > [249.843361] Aborting: Lockdep not active > [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1 > [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true > [249.859075] Closing watchdogs > > CC: Ryszard Knop <ryszard.knop@intel.com> > CC: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> > CC: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> > CC: Lucas De Marchi <lucas.demarchi@intel.com> > CC: luciano.coelho@intel.com > CC: nirmoy.das@intel.com > CC: stuart.summers@intel.com > CC: himal.prasad.ghimiray@intel.com > CC: dominik.karol.piatkowski@intel.com > CC: katarzyna.piecielska@intel.com > Signed-off-by: Peter Senna Tschudin <peter.senna@linux.intel.com> > --- > v10: > - fix memory leaks from asprintf (Thank you Dominik Karol!) > - fix comments for consistency (Thank you Dominik Karol!) > > v9: > - do not report new hardware when loading/unloading kmod changes > the string of the GPU name. I accidentally reintroduced this > issue when refactoring to use linked lists. > - add tools/lsfacts: 9 lines of code that print either the facts > or that no facts were found. > - fix code comments describing functions > - fix white space issues > > v8: > - fix white space issues > > v7: > - refactor to use linked lists provided by igt_lists > - Added function arguments to code comments > - updated commit message > > v6: > - sort includes in igt_facts.c alphabetically > - add facts for kernel taints using igt_kernel_tainted() and > igt_explain_taints() > > v5: > - fix the broken patch format from v4 > > v4: > - fix a bug on delete_fact() > - drop glib and calls to g_ functions > - change commit message to indicate that report only on fact changes > - use consistent format for reporting changes > - fix SPDX header format > > v3: > - refreshed commit message > - changed format SPDX string > - removed license text > - replace last_test assignment when null by two ternary operators > - added function descriptions following example found elsewhere in > the code > - added igt_assert to catch failures to realloc() > > v2: > - add lib/tests/igt_facts.c for basic unit testing > - bugfix: do not report a new gpu when the driver changes the gpu name > - bugfix: do not report the pci_id twice on the gpu name > > lib/igt_facts.c | 755 ++++++++++++++++++++++++++++++++++++++++++ > lib/igt_facts.h | 47 +++ > lib/meson.build | 1 + > lib/tests/igt_facts.c | 15 + > lib/tests/meson.build | 1 + > runner/executor.c | 10 + > tools/lsfacts.c | 25 ++ > tools/meson.build | 1 + > 8 files changed, 855 insertions(+) > create mode 100644 lib/igt_facts.c > create mode 100644 lib/igt_facts.h > create mode 100644 lib/tests/igt_facts.c > create mode 100644 tools/lsfacts.c > > diff --git a/lib/igt_facts.c b/lib/igt_facts.c > new file mode 100644 > index 000000000..4749d3417 > --- /dev/null > +++ b/lib/igt_facts.c > @@ -0,0 +1,755 @@ > +// SPDX-License-Identifier: MIT > +// Copyright © 2024 Intel Corporation > + > +#include <ctype.h> > +#include <libudev.h> > +#include <stdio.h> > +#include <sys/time.h> > +#include <time.h> > + > +#include "igt_core.h" > +#include "igt_device_scan.h" > +#include "igt_facts.h" > +#include "igt_kmod.h" > +#include "igt_list.h" > +#include "igt_taints.h" > + > +static struct igt_list_head igt_facts_list_drm_card_head; > +static struct igt_list_head igt_facts_list_kmod_head; > +static struct igt_list_head igt_facts_list_ktaint_head; > +static struct igt_list_head igt_facts_list_pci_gpu_head; > + > + > +/** > + * igt_facts_lists_init: > + * > + * Initialize igt_facts linked lists. > + * > + * Returns: void > + */ > +void igt_facts_lists_init(void) > +{ > + IGT_INIT_LIST_HEAD(&igt_facts_list_drm_card_head); > + IGT_INIT_LIST_HEAD(&igt_facts_list_kmod_head); > + IGT_INIT_LIST_HEAD(&igt_facts_list_ktaint_head); > + IGT_INIT_LIST_HEAD(&igt_facts_list_pci_gpu_head); > +} > + > + > +/** > + * igt_facts_log: > + * @last_test: name of the test that triggered the fact > + * @name: name of the fact > + * @new_value: new value of the fact > + * @old_value: old value of the fact > + * > + * Reports fact changes: > + * - new fact: if old_value is NULL and new_value is not NULL > + * - deleted fact: if new_value is NULL and old_value is not NULL > + * - changed fact: if new_value is different from old_value > + * > + * Returns: void > + */ > +static void igt_facts_log(const char *last_test, const char *name, > + const char *new_value, const char *old_value) > +{ > + struct timespec uptime_ts; > + char *uptime = NULL; > + const char *before_tests = "before any test"; > + > + if (old_value == NULL && new_value == NULL) > + return; > + > + if (clock_gettime(CLOCK_BOOTTIME, &uptime_ts) != 0) > + return; > + > + asprintf(&uptime, > + "%ld.%06ld", > + uptime_ts.tv_sec, > + uptime_ts.tv_nsec / 1000); > + > + /* New fact */ > + if (old_value == NULL && new_value != NULL) { > + igt_info("[%s] [FACT %s] new: %s: %s\n", > + uptime, > + last_test ? last_test : before_tests, > + name, > + new_value); > + goto out; > + } > + > + /* Update fact */ > + if (old_value != NULL && new_value != NULL) { > + igt_info("[%s] [FACT %s] changed: %s: %s -> %s\n", > + uptime, > + last_test ? last_test : before_tests, > + name, > + old_value, > + new_value); > + goto out; > + } > + > + /* Deleted fact */ > + if (old_value != NULL && new_value == NULL) { > + igt_info("[%s] [FACT %s] deleted: %s: %s\n", > + uptime, > + last_test ? last_test : before_tests, > + name, > + old_value); > + goto out; > + } > + > +out: > + free(uptime); > +} > + > +/** > + * igt_facts_list_get: > + * @name: name of the fact to be added > + * @head: head of the list > + * > + * Get a fact from the list. > + * > + * Returns: pointer to the fact if found, NULL otherwise > + * > + */ > +static igt_fact *igt_facts_list_get(const char *name, > + struct igt_list_head *head) > +{ > + igt_fact *fact = NULL; > + > + if (igt_list_empty(head)) > + return NULL; > + > + igt_list_for_each_entry(fact, head, link) { > + if (strcmp(fact->name, name) == 0) > + return fact; > + } > + return NULL; > +} > + > +/** > + * igt_facts_list_del: > + * @name: name of the fact to be added > + * @head: head of the list > + * @last_test: name of the last test > + * @log: bool indicating if the delete operation should be logged > + * > + * Delete a fact from the list. > + * > + * Returns: bool indicating if fact was deleted from the list > + * > + */ > +static bool igt_facts_list_del(const char *name, > + struct igt_list_head *head, > + const char *last_test, > + bool log) > +{ > + igt_fact *fact = NULL; > + > + if (igt_list_empty(head)) > + return false; > + > + igt_list_for_each_entry(fact, head, link) { > + if (strcmp(fact->name, name) == 0) { > + if (log) > + igt_facts_log(last_test, fact->name, > + NULL, fact->value); > + > + igt_list_del(&fact->link); > + free(fact->name); > + free(fact->value); > + free(fact->last_test); > + free(fact); > + return true; > + } > + } > + return false; > +} > + > +/** > + * igt_facts_list_add: > + * @name: name of the fact to be added > + * @value: value of the fact to be added > + * @last_test: name of the last test > + * @head: head of the list > + * > + * Returns: bool indicating if fact was added to the list > + * > + */ > +static bool igt_facts_list_add(const char *name, > + const char *value, > + const char *last_test, > + struct igt_list_head *head) > +{ > + igt_fact *new_fact = NULL, *old_fact = NULL; > + bool logged = false; > + > + if (name == NULL || value == NULL) > + return false; > + > + old_fact = igt_facts_list_get(name, head); > + if (old_fact) { > + if (strcmp(old_fact->value, value) == 0) { > + old_fact->present = true; > + return false; > + } > + igt_facts_log(last_test, name, value, old_fact->value); > + logged = true; > + igt_facts_list_del(name, head, last_test, false); > + } > + > + new_fact = malloc(sizeof(igt_fact)); > + if (new_fact == NULL) > + return false; > + > + new_fact->name = strdup(name); > + new_fact->value = strdup(value); > + new_fact->last_test = last_test ? strdup(last_test) : NULL; > + new_fact->present = true; > + > + if (!logged) > + igt_facts_log(last_test, name, value, NULL); > + > + igt_list_add(&new_fact->link, head); > + > + return true; > +} > + > +/** > + * igt_facts_list_mark: > + * @head: head of the list > + * > + * Mark all facts in the list as not present. Opted for the mark and sweep > + * design pattern due to its simplicity and efficiency. > + * > + * Returns: void > + */ > +static void igt_facts_list_mark(struct igt_list_head *head) > +{ > + igt_fact *fact = NULL; > + > + if (igt_list_empty(head)) > + return; > + > + igt_list_for_each_entry(fact, head, link) > + fact->present = false; > +} > + > +/** > + * igt_facts_list_sweep: > + * @head: head of the list > + * @last_test: name of the last test > + * > + * Sweep the list and delete all facts that are not present. Opted for the mark > + * and sweep design pattern due to its simplicity and efficiency. > + * > + * Returns: void > + */ > +static void igt_facts_list_sweep(struct igt_list_head *head, > + const char *last_test) > +{ > + igt_fact *fact = NULL, *tmp = NULL; > + > + if (igt_list_empty(head)) > + return; > + > + igt_list_for_each_entry_safe(fact, tmp, head, link) > + if (!fact->present) > + igt_facts_list_del(fact->name, head, last_test, true); > +} > + > +/** > + * igt_facts_list_mark_and_sweep: > + * @head: head of the list > + * > + * Clean up the list using mark and sweep. Opted for the mark and sweep > + * design pattern due to its simplicity and efficiency. > + * > + * Returns: void > + */ > +static void igt_facts_list_mark_and_sweep(struct igt_list_head *head) > +{ > + igt_facts_list_mark(head); > + igt_facts_list_sweep(head, NULL); > +} > + > +/** > + * igt_facts_are_all_lists_empty: > + * > + * Returns true if all lists are empty. Used by the tool lsfacts. > + * > + * Returns: bool > + */ > +bool igt_facts_are_all_lists_empty(void) > +{ > + return igt_list_empty(&igt_facts_list_drm_card_head) && > + igt_list_empty(&igt_facts_list_kmod_head) && > + igt_list_empty(&igt_facts_list_ktaint_head) && > + igt_list_empty(&igt_facts_list_pci_gpu_head); > +} > + > +/** > + * igt_facts_scan_pci_gpus: > + * @last_test: name of the last test > + * > + * This function scans the pci bus for gpus using udev. It uses > + * igt_facts_list_mark(), igt_facts_list_add() and igt_facts_list_sweep() to > + * update igt_facts_list_pci_gpu_head. > + * > + * Returns: void > + */ > +static void igt_facts_scan_pci_gpus(const char *last_test) > +{ > + static struct igt_list_head *head = &igt_facts_list_pci_gpu_head; > + struct udev *udev = NULL; > + struct udev_enumerate *enumerate = NULL; > + struct udev_list_entry *devices, *dev_list_entry; > + struct igt_device_card card; > + char pcistr[10]; > + int ret; > + char *factname = NULL; > + char *factvalue = NULL; > + > + udev = udev_new(); > + if (!udev) { > + igt_warn("Failed to create udev context\n"); > + return; > + } > + > + enumerate = udev_enumerate_new(udev); > + if (!enumerate) { > + igt_warn("Failed to create udev enumerate\n"); > + udev_unref(udev); > + return; > + } > + > + ret = udev_enumerate_add_match_subsystem(enumerate, "pci"); > + if (ret < 0) > + goto out; > + > + ret = udev_enumerate_add_match_property(enumerate, > + "PCI_CLASS", > + "30000"); > + if (ret < 0) > + goto out; > + > + ret = udev_enumerate_add_match_property(enumerate, > + "PCI_CLASS", > + "38000"); > + if (ret < 0) > + goto out; > + > + ret = udev_enumerate_scan_devices(enumerate); > + if (ret < 0) > + goto out; > + > + devices = udev_enumerate_get_list_entry(enumerate); > + if (!devices) > + goto out; > + > + igt_facts_list_mark(head); > + > + udev_list_entry_foreach(dev_list_entry, devices) { > + const char *path; > + struct udev_device *udev_dev; > + struct udev_list_entry *entry; > + char *model = NULL; > + char *codename = NULL; > + igt_fact *old_fact = NULL; > + > + path = udev_list_entry_get_name(dev_list_entry); > + udev_dev = udev_device_new_from_syspath(udev, path); > + if (!udev_dev) > + continue; > + > + /* Strip path to only the content after the last / */ > + path = strrchr(path, '/'); > + if (path) > + path++; > + else > + path = "unknown"; > + > + strcpy(card.pci_slot_name, "-"); > + > + entry = udev_device_get_properties_list_entry(udev_dev); > + while (entry) { > + const char *name = udev_list_entry_get_name(entry); > + const char *value = udev_list_entry_get_value(entry); > + > + entry = udev_list_entry_get_next(entry); > + if (!strcmp(name, "ID_MODEL_FROM_DATABASE")) > + model = strdup(value); > + else if (!strcmp(name, "PCI_ID")) > + igt_assert_eq(sscanf(value, "%hx:%hx", > + &card.pci_vendor, > + &card.pci_device), 2); > + } > + snprintf(pcistr, sizeof(pcistr), "%04x:%04x", > + card.pci_vendor, card.pci_device); > + codename = igt_device_get_pretty_name(&card, false); > + > + /* Set codename to null if it is the same string as pci_id */ > + if (codename && strcmp(pcistr, codename) == 0) { > + free(codename); > + codename = NULL; > + } > + asprintf(&factname, "%s.%s", pci_gpu_fact, path); > + asprintf(&factvalue, > + "%s %s %s", > + pcistr, > + codename ? codename : "", > + model ? model : ""); > + > + /** > + * Loading and unloading the kmod may change the human > + * readeable string in value. Do not change value if the > + * pci id is the same. > + */ > + old_fact = igt_facts_list_get(factname, head); > + if (old_fact && strncmp(old_fact->value, factvalue, 9) == 0) > + old_fact->present = true; > + else > + igt_facts_list_add(factname, factvalue, last_test, head); > + > + free(codename); > + free(model); > + free(factname); > + free(factvalue); > + udev_device_unref(udev_dev); > + } > + > + igt_facts_list_sweep(head, last_test); > + > +out: > + udev_enumerate_unref(enumerate); > + udev_unref(udev); > +} > + > +/** > + * igt_facts_scan_pci_drm_cards: > + * @last_test: name of the last test > + * > + * This function scans the pci bus for drm cards using udev. It uses the > + * igt_facts_list_mark(), igt_facts_list_add() and igt_facts_list_sweep() to > + * update igt_facts_list_drm_card_head. > + * > + * Returns: void > + */ > +static void igt_facts_scan_pci_drm_cards(const char *last_test) > +{ > + static struct igt_list_head *head = &igt_facts_list_drm_card_head; > + struct udev *udev = NULL; > + struct udev_enumerate *enumerate = NULL; > + struct udev_list_entry *devices, *dev_list_entry; > + int ret; > + char *factname = NULL; > + char *factvalue = NULL; > + > + udev = udev_new(); > + if (!udev) > + return; > + > + enumerate = udev_enumerate_new(udev); > + if (!enumerate) { > + udev_unref(udev); > + return; > + } > + > + ret = udev_enumerate_add_match_subsystem(enumerate, "drm"); > + if (ret < 0) > + goto out; > + > + ret = udev_enumerate_scan_devices(enumerate); > + if (ret < 0) > + goto out; > + > + devices = udev_enumerate_get_list_entry(enumerate); > + if (!devices) > + goto out; > + > + ret = udev_enumerate_add_match_subsystem(enumerate, "drm"); > + if (ret < 0) > + goto out; > + > + ret = udev_enumerate_scan_devices(enumerate); > + if (ret < 0) > + goto out; > + > + devices = udev_enumerate_get_list_entry(enumerate); > + if (!devices) > + goto out; > + > + igt_facts_list_mark(head); > + > + udev_list_entry_foreach(dev_list_entry, devices) { > + const char *path; > + struct udev_device *drm_dev, *pci_dev; > + const char *drm_name, *pci_addr; > + > + path = udev_list_entry_get_name(dev_list_entry); > + drm_dev = udev_device_new_from_syspath(udev, path); > + if (!drm_dev) > + continue; > + > + drm_name = udev_device_get_sysname(drm_dev); > + /* Filter the device by name. Want devices such as card0 and card1. > + * If the device has '-' in the name, contine > + */ > + if (strncmp(drm_name, "card", 4) != 0 || > + strchr(drm_name, '-') != NULL) { > + udev_device_unref(drm_dev); > + continue; > + } > + > + /* Get the pci address of the gpu associated with the drm_dev*/ > + pci_dev = udev_device_get_parent_with_subsystem_devtype(drm_dev, > + "pci", > + NULL); > + if (pci_dev) { > + pci_addr = udev_device_get_sysattr_value(pci_dev, > + "address"); > + if (!pci_addr) > + pci_addr = udev_device_get_sysname(pci_dev); > + } else { > + /* Some GPUs are platform devices. Ignore them. */ > + pci_addr = NULL; > + udev_device_unref(drm_dev); > + continue; > + } > + > + asprintf(&factname, "%s.%s", drm_card_fact, pci_addr); > + asprintf(&factvalue, "%s", drm_name); > + > + igt_facts_list_add(factname, factvalue, last_test, head); > + > + free(factname); > + free(factvalue); > + udev_device_unref(drm_dev); > + } > + > + igt_facts_list_sweep(head, last_test); > + > +out: > + udev_enumerate_unref(enumerate); > + udev_unref(udev); > +} > + > +/** > + * igt_facts_scan_kernel_taints: > + * @last_test: name of the last test > + * > + * This function scans for kernel taints using igt_kernel_tainted() and > + * igt_explain_taints(). It will cut off the explanation keeping only the > + * taint name. > + * > + * Returns: void > + */ > +static void igt_facts_scan_kernel_taints(const char *last_test) > +{ > + static struct igt_list_head *head = &igt_facts_list_ktaint_head; > + unsigned long taints = 0; > + const char *reason = NULL; > + char *taint_name = NULL; > + char *fact_name = NULL; > + > + taints = igt_kernel_tainted(&taints); > + /* For testing, set all bits to 1 > + * taints = 0xFFFFFFFF; > + */ > + > + > + igt_facts_list_mark(head); > + > + while ((reason = igt_explain_taints(&taints)) != NULL) { > + /* Cut at the ':' to get only the taint name */ > + taint_name = strtok(strdup(reason), ":"); > + if (!taint_name) > + continue; > + > + /* Lowercase taint_name */ > + for (int i = 0; taint_name[i]; i++) > + taint_name[i] = tolower(taint_name[i]); > + > + asprintf(&fact_name, "%s.%s", ktaint_fact, taint_name); > + igt_facts_list_add(fact_name, "true", last_test, head); > + > + free(taint_name); > + free(fact_name); > + } > + > + igt_facts_list_sweep(head, last_test); > +} > + > + > +/** > + * igt_facts_scan_kernel_loaded_kmods: > + * @last_test: name of the last test > + * > + * This function scans for loaded kmods using igt_fact_kmod_list and > + * igt_kmod_is_loaded(). > + * > + * Returns: void > + */ > +static void igt_facts_scan_kernel_loaded_kmods(const char *last_test) > +{ > + static struct igt_list_head *head = &igt_facts_list_kmod_head; > + char *name = NULL; > + > + igt_facts_list_mark(head); > + > + /* Iterate over igt_fact_kmod_list[] until the element contains "\0" */ > + for (int i = 0; strcmp(igt_fact_kmod_list[i], "\0") != 0; i++) { > + asprintf(&name, "%s.%s", kmod_fact, igt_fact_kmod_list[i]); > + if (igt_kmod_is_loaded(igt_fact_kmod_list[i])) > + igt_facts_list_add(name, "true", last_test, head); > + > + free(name); > + } > + > + igt_facts_list_sweep(head, last_test); > +} > + > +/** > + * igt_facts: > + * @last_test: name of the last test > + * > + * Call this function where you want to gather and report facts. > + * > + * Returns: void > + */ > +void igt_facts(const char *last_test) > +{ > + igt_facts_scan_pci_gpus(last_test); > + igt_facts_scan_pci_drm_cards(last_test); > + igt_facts_scan_kernel_taints(last_test); > + igt_facts_scan_kernel_loaded_kmods(last_test); > + > + fflush(stdout); > + fflush(stderr); > +} > + > +/* > + * Testing > + * > + * Defined here to keep most of the functions static > + * > + */ > + > +/** > + * igt_facts_test_add_get: > + * @head: head of the list > + * > + * Tests igt_facts_list_add and igt_facts_list_get. > + * > + * Returns: void > + */ > +static void igt_facts_test_add_get(struct igt_list_head *head) > +{ > + igt_fact *fact = NULL; > + bool ret; > + const char *name = "hardware.pci.gpu_at_addr.0000:00:02.0"; > + const char *value = "8086:64a0 Intel Lunarlake (Gen20)"; > + const char *last_test = NULL; > + > + ret = igt_facts_list_add(name, value, last_test, head); > + igt_assert(ret == true); > + > + // Assert that there is one element in the linked list > + igt_assert_eq(igt_list_length(head), 1); > + > + // Assert that the element in the linked list is the one we added > + fact = igt_facts_list_get(name, head); > + igt_assert(fact != NULL); > + igt_assert_eq(strcmp(fact->name, name), 0); > + igt_assert_eq(strcmp(fact->value, value), 0); > + igt_assert(fact->present == true); > + igt_assert(fact->last_test == NULL); > +} > + > +/** > + * igt_facts_test_mark_and_sweep: > + * @head: head of the list > + * > + * - Add 3 elements to the list and mark them as not present. > + * - Update two of the elements and mark them as present. > + * - Sweep the list and assert that > + * - Only the two updated elements are present > + * - The third element was deleted > + * > + * Returns: void > + */ > +static void igt_facts_test_mark_and_sweep(struct igt_list_head *head) > +{ > + igt_fact *fact = NULL; > + const char *name1 = "hardware.pci.gpu_at_addr.0000:00:02.0"; > + const char *value1 = "8086:64a0 Intel Lunarlake (Gen20)"; > + const char *name2 = "hardware.pci.gpu_at_addr.0000:00:03.0"; > + const char *value2 = "8086:64a1 Intel Lunarlake (Gen21)"; > + const char *name3 = "hardware.pci.gpu_at_addr.0000:00:04.0"; > + const char *value3 = "8086:64a2 Intel Lunarlake (Gen22)"; > + > + igt_facts_list_add(name1, value1, NULL, head); > + igt_facts_list_add(name2, value2, NULL, head); > + igt_facts_list_add(name3, value3, NULL, head); > + > + igt_facts_list_mark(head); > + > + igt_facts_list_add(name1, value1, NULL, head); > + igt_facts_list_add(name2, value2, NULL, head); > + > + igt_facts_list_sweep(head, NULL); > + > + // Assert that there are two elements in the linked list > + igt_assert_eq(igt_list_length(head), 2); > + > + // Assert that the two updated elements are present > + fact = igt_facts_list_get(name1, head); > + igt_assert(fact != NULL); > + igt_assert(fact->present == true); > + > + fact = igt_facts_list_get(name2, head); > + igt_assert(fact != NULL); > + igt_assert(fact->present == true); > + > + // Assert that the third element was deleted > + fact = igt_facts_list_get(name3, head); > + igt_assert(fact == NULL); > +} > + > +/** > + * igt_facts_test: > + * > + * Main function for testing the igt_facts module > + * > + * Returns: bool indicating if the tests passed > + */ > +void igt_facts_test(void) > +{ > + const char *last_test = "Unit Testing"; > + > + igt_facts_lists_init(); > + > + /* Assert that all lists are empty */ > + igt_assert(igt_list_empty(&igt_facts_list_kmod_head)); > + igt_assert(igt_list_empty(&igt_facts_list_ktaint_head)); > + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head)); > + igt_assert(igt_list_empty(&igt_facts_list_drm_card_head)); > + > + /* Assert that add and get work. Will add one element to the list */ > + igt_facts_test_add_get(&igt_facts_list_pci_gpu_head); > + > + /* Assert that igt_facts_list_mark_and_sweep() cleans up the list */ > + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == false); > + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head); > + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == true); > + > + /* Test the mark and sweep pattern used to delete elements > + * from the list > + */ > + igt_facts_test_mark_and_sweep(&igt_facts_list_pci_gpu_head); > + > + /* Clean up the list and call igt_facts(). This should not crash */ > + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head); > + igt_facts(last_test); > +} > diff --git a/lib/igt_facts.h b/lib/igt_facts.h > new file mode 100644 > index 000000000..e4adca3fb > --- /dev/null > +++ b/lib/igt_facts.h > @@ -0,0 +1,47 @@ > +/* SPDX-License-Identifier: MIT > + * Copyright © 2024 Intel Corporation > + */ > + > +#include <stdbool.h> > + > +#include "igt_list.h" > + > + > +/* igt_fact: > + * @name: name of the fact > + * @value: value of the fact > + * @last_test: name of the test that triggered the fact > + * @present: bool indicating if fact is present. Used for deleting facts from > + * the list. > + * @link: link to the next fact > + * > + * A fact is a piece of information that can be used to determine the state of > + * the system. > + * > + */ > +typedef struct { > + char *name; > + char *value; > + char *last_test; > + bool present; /* For mark and seep */ > + struct igt_list_head link; > +} igt_fact; > + > +const char *igt_fact_kmod_list[] = { > + "amdgpu", > + "i915", > + "nouveau", > + "radeon", > + "xe", > + "\0" > +}; > + > +const char *kmod_fact = "kernel.kmod_is_loaded"; /* true or false */ > +const char *ktaint_fact = "kernel.is_tainted"; /* taint name: taint_warn */ > +const char *pci_gpu_fact = "hardware.pci.gpu_at_addr"; /* id vendor model */ > +const char *drm_card_fact = "hardware.pci.drm_card_at_addr"; /* cardX */ > + > +void igt_facts_lists_init(void); > +void igt_facts(const char *last_test); > +bool igt_facts_are_all_lists_empty(void); > +void igt_facts_test(void); /* For unit testing only */ > diff --git a/lib/meson.build b/lib/meson.build > index c3556a921..c44ca2b5a 100644 > --- a/lib/meson.build > +++ b/lib/meson.build > @@ -18,6 +18,7 @@ lib_sources = [ > 'i915/i915_crc.c', > 'igt_collection.c', > 'igt_color_encoding.c', > + 'igt_facts.c', > 'igt_crc.c', > 'igt_debugfs.c', > 'igt_device.c', > diff --git a/lib/tests/igt_facts.c b/lib/tests/igt_facts.c > new file mode 100644 > index 000000000..7fa9d0f22 > --- /dev/null > +++ b/lib/tests/igt_facts.c > @@ -0,0 +1,15 @@ > +// SPDX-License-Identifier: MIT > +// Copyright © 2024 Intel Corporation > + > +#include <stdbool.h> > + > +#include "igt_core.h" > +#include "igt_facts.h" > + > +/* Tests are not defined here so we can keep most of the functions static */ > + > +igt_simple_main > +{ > + igt_info("Running igt_facts_test\n"); > + igt_facts_test(); > +} > diff --git a/lib/tests/meson.build b/lib/tests/meson.build > index df8092638..1ce19f63c 100644 > --- a/lib/tests/meson.build > +++ b/lib/tests/meson.build > @@ -8,6 +8,7 @@ lib_tests = [ > 'igt_dynamic_subtests', > 'igt_edid', > 'igt_exit_handler', > + 'igt_facts', > 'igt_fork', > 'igt_fork_helper', > 'igt_hook', > diff --git a/runner/executor.c b/runner/executor.c > index ac73e1dde..d1eca3c05 100644 > --- a/runner/executor.c > +++ b/runner/executor.c > @@ -30,6 +30,7 @@ > > #include "igt_aux.h" > #include "igt_core.h" > +#include "igt_facts.h" > #include "igt_taints.h" > #include "igt_vec.h" > #include "executor.h" > @@ -2306,6 +2307,9 @@ bool execute(struct execute_state *state, > sigset_t sigmask; > double time_spent = 0.0; > bool status = true; > + char *last_test = NULL; > + > + igt_facts_lists_init(); > > if (state->dry) { > outf("Dry run, not executing. Invoke igt_resume if you want to execute.\n"); > @@ -2438,6 +2442,10 @@ bool execute(struct execute_state *state, > int result; > bool already_written = false; > > + /* Calls before running each test */ > + igt_facts(last_test); > + last_test = entry_display_name(&job_list->entries[state->next]); > + > if (should_die_because_signal(sigfd)) { > status = false; > goto end; > @@ -2526,6 +2534,8 @@ bool execute(struct execute_state *state, > return execute(state, settings, job_list); > } > } > + /* Last call to collect facts after the last test runs */ > + igt_facts(last_test); > > if ((timefd = openat(resdirfd, "endtime.txt", O_CREAT | O_WRONLY | O_EXCL, 0666)) >= 0) { > dprintf(timefd, "%f\n", timeofday_double()); > diff --git a/tools/lsfacts.c b/tools/lsfacts.c > new file mode 100644 > index 000000000..10dee0317 > --- /dev/null > +++ b/tools/lsfacts.c > @@ -0,0 +1,25 @@ > +// SPDX-License-Identifier: MIT > +// Copyright © 2024 Intel Corporation > + > +#include "igt.h" > +#include "igt_facts.h" > + > +/** > + * SECTION:lsfacts > + * @short_description: lsfacts > + * @title: lsfacts > + * @include: lsfacts.c > + * > + * # lsfacts > + * > + * Scan for igt-facts and print them on screen. Indicate if no facts are found. > + */ > +int main(int argc, char *argv[]) > +{ > + igt_facts_lists_init(); > + > + igt_facts("lsfacts"); > + > + if (igt_facts_are_all_lists_empty()) > + igt_info("No facts found...\n"); > +} > diff --git a/tools/meson.build b/tools/meson.build > index 48c9a4b50..ff1b0ef90 100644 > --- a/tools/meson.build > +++ b/tools/meson.build > @@ -42,6 +42,7 @@ tools_progs = [ > 'intel_gem_info', > 'intel_gvtg_test', > 'dpcd_reg', > + 'lsfacts', > 'lsgpu', > 'power', > ] > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking 2024-12-05 14:05 ` [PATCH i-g-t v10] igt-runner fact checking Janusz Krzysztofik @ 2024-12-06 5:45 ` Peter Senna Tschudin 2024-12-09 9:17 ` Janusz Krzysztofik 0 siblings, 1 reply; 13+ messages in thread From: Peter Senna Tschudin @ 2024-12-06 5:45 UTC (permalink / raw) To: Janusz Krzysztofik, igt-dev@lists.freedesktop.org Cc: Ryszard Knop, Zbigniew Kempczyński, Lucas De Marchi, luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray, dominik.karol.piatkowski, katarzyna.piecielska Hi Janusz, Thank you for your detailed comments. I appreciate the opportunity to clarify and address your concerns. On 05.12.2024 15:05, Janusz Krzysztofik wrote: > Hi Peter, > > On Wednesday, 4 December 2024 19:44:53 CET Peter Senna Tschudin wrote: >> When using igt-runner, collect facts before each test and after the >> last test, and report when facts change. The facts are: >> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20) >> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1 >> - Kernel taints: kernel.is_tainted.taint_warn: true >> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true >> >> This change imposes little execution overhead and adds just a few >> lines of logging. The facts will be printed on normal igt-runner >> output. Here is a real example from our CI shwoing >> hotreplug-lateclose changing the DRM card number > > Since you give that as an example of how helpful your facts can be, and follow > that with a kernel taint example, that may indicate you think, and users of > your facts may then be mislead having that read, that the taint was related to > the change of card number, while both had nothing to do with each other. Let’s take a step back to define the purpose and scope of igt-facts: - Definition of a fact from the dictionary: A fact is an objectively verifiable piece of information. - Purpose of igt-facts: Track which tests cause changes to the facts. The operation is straightforward: facts are collected before and after each test, and any differences are logged. Here’s an example showing a fact change and a new fact after running hotreplug-lateclose: [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1 [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true This output highlights the facts without implying causation between them. The tool(and my commit message) neither explains relationships between facts nor misleads users into assuming causation. > > Please add something like ', which is expected,' to your description. Changed > card number is expected, and that's nothing wrong with that. The old, still > open instance of the driver still exists. It is expected to be already > decoupled from hardware, but it still occupies its minor device number. Then, > new instance of the driver, attached to the same hardware, gets first free > minor number. Refresh of IGT device filter before health check can perfectly > handle that case. The igt-facts tool reports changes without qualifying them as expected or unexpected. For example, the change in the DRM card number is logged as a fact, irrespective of its expected nature. > >> and tainting the >> kernel on the abort path: > > No, hotreplug-lateclose doesn't taint the kernel. That's wrong conclusion from > what was reported, or wrong wording at least. Kernel taint (or lockdep not > active) must have been a result of driver issues (or maybe well known > limitations) exposed by the test. Then igt_runner took the abort path since > it detected those unhealthy conditions. Again, users of your facts may be > mislead if your messages can really suggest what you tell us about what you > think has happened. The kernel taint reported is a fact observed after the test. While its root cause lies within the driver or other system components, this is outside the scope of igt-facts. The report simply reflects what changed during the test. To summarize: - igt-facts is a factual reporting tool. It does not establish causation or interpret changes. - Both the DRM card number change and kernel taint were factual observations post hotreplug-lateclose. I hope this clarifies the intent and operation of igt-facts. Please let me know if further discussion is needed. Thank you, Peter > > The whole idea of testing i915 resistance to hot unplug scenarios came from > perceived cases of VFs potentially disappearing from VMs. Some significant > effort was put on that feature of the driver, but since the 'hot' test > variants were never unblocked in CI, things tended to get worse and worse over > time while new driver features that didn't care for hot unplug capability were > added. > > Thanks, > Janusz > >> >> [245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose) >> [245.383596] Starting subtest: hotreplug-lateclose >> [249.843361] Aborting: Lockdep not active >> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1 >> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true >> [249.859075] Closing watchdogs >> >> CC: Ryszard Knop <ryszard.knop@intel.com> >> CC: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> >> CC: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> >> CC: Lucas De Marchi <lucas.demarchi@intel.com> >> CC: luciano.coelho@intel.com >> CC: nirmoy.das@intel.com >> CC: stuart.summers@intel.com >> CC: himal.prasad.ghimiray@intel.com >> CC: dominik.karol.piatkowski@intel.com >> CC: katarzyna.piecielska@intel.com >> Signed-off-by: Peter Senna Tschudin <peter.senna@linux.intel.com> >> --- >> v10: >> - fix memory leaks from asprintf (Thank you Dominik Karol!) >> - fix comments for consistency (Thank you Dominik Karol!) >> >> v9: >> - do not report new hardware when loading/unloading kmod changes >> the string of the GPU name. I accidentally reintroduced this >> issue when refactoring to use linked lists. >> - add tools/lsfacts: 9 lines of code that print either the facts >> or that no facts were found. >> - fix code comments describing functions >> - fix white space issues >> >> v8: >> - fix white space issues >> >> v7: >> - refactor to use linked lists provided by igt_lists >> - Added function arguments to code comments >> - updated commit message >> >> v6: >> - sort includes in igt_facts.c alphabetically >> - add facts for kernel taints using igt_kernel_tainted() and >> igt_explain_taints() >> >> v5: >> - fix the broken patch format from v4 >> >> v4: >> - fix a bug on delete_fact() >> - drop glib and calls to g_ functions >> - change commit message to indicate that report only on fact changes >> - use consistent format for reporting changes >> - fix SPDX header format >> >> v3: >> - refreshed commit message >> - changed format SPDX string >> - removed license text >> - replace last_test assignment when null by two ternary operators >> - added function descriptions following example found elsewhere in >> the code >> - added igt_assert to catch failures to realloc() >> >> v2: >> - add lib/tests/igt_facts.c for basic unit testing >> - bugfix: do not report a new gpu when the driver changes the gpu name >> - bugfix: do not report the pci_id twice on the gpu name >> >> lib/igt_facts.c | 755 ++++++++++++++++++++++++++++++++++++++++++ >> lib/igt_facts.h | 47 +++ >> lib/meson.build | 1 + >> lib/tests/igt_facts.c | 15 + >> lib/tests/meson.build | 1 + >> runner/executor.c | 10 + >> tools/lsfacts.c | 25 ++ >> tools/meson.build | 1 + >> 8 files changed, 855 insertions(+) >> create mode 100644 lib/igt_facts.c >> create mode 100644 lib/igt_facts.h >> create mode 100644 lib/tests/igt_facts.c >> create mode 100644 tools/lsfacts.c >> >> diff --git a/lib/igt_facts.c b/lib/igt_facts.c >> new file mode 100644 >> index 000000000..4749d3417 >> --- /dev/null >> +++ b/lib/igt_facts.c >> @@ -0,0 +1,755 @@ >> +// SPDX-License-Identifier: MIT >> +// Copyright © 2024 Intel Corporation >> + >> +#include <ctype.h> >> +#include <libudev.h> >> +#include <stdio.h> >> +#include <sys/time.h> >> +#include <time.h> >> + >> +#include "igt_core.h" >> +#include "igt_device_scan.h" >> +#include "igt_facts.h" >> +#include "igt_kmod.h" >> +#include "igt_list.h" >> +#include "igt_taints.h" >> + >> +static struct igt_list_head igt_facts_list_drm_card_head; >> +static struct igt_list_head igt_facts_list_kmod_head; >> +static struct igt_list_head igt_facts_list_ktaint_head; >> +static struct igt_list_head igt_facts_list_pci_gpu_head; >> + >> + >> +/** >> + * igt_facts_lists_init: >> + * >> + * Initialize igt_facts linked lists. >> + * >> + * Returns: void >> + */ >> +void igt_facts_lists_init(void) >> +{ >> + IGT_INIT_LIST_HEAD(&igt_facts_list_drm_card_head); >> + IGT_INIT_LIST_HEAD(&igt_facts_list_kmod_head); >> + IGT_INIT_LIST_HEAD(&igt_facts_list_ktaint_head); >> + IGT_INIT_LIST_HEAD(&igt_facts_list_pci_gpu_head); >> +} >> + >> + >> +/** >> + * igt_facts_log: >> + * @last_test: name of the test that triggered the fact >> + * @name: name of the fact >> + * @new_value: new value of the fact >> + * @old_value: old value of the fact >> + * >> + * Reports fact changes: >> + * - new fact: if old_value is NULL and new_value is not NULL >> + * - deleted fact: if new_value is NULL and old_value is not NULL >> + * - changed fact: if new_value is different from old_value >> + * >> + * Returns: void >> + */ >> +static void igt_facts_log(const char *last_test, const char *name, >> + const char *new_value, const char *old_value) >> +{ >> + struct timespec uptime_ts; >> + char *uptime = NULL; >> + const char *before_tests = "before any test"; >> + >> + if (old_value == NULL && new_value == NULL) >> + return; >> + >> + if (clock_gettime(CLOCK_BOOTTIME, &uptime_ts) != 0) >> + return; >> + >> + asprintf(&uptime, >> + "%ld.%06ld", >> + uptime_ts.tv_sec, >> + uptime_ts.tv_nsec / 1000); >> + >> + /* New fact */ >> + if (old_value == NULL && new_value != NULL) { >> + igt_info("[%s] [FACT %s] new: %s: %s\n", >> + uptime, >> + last_test ? last_test : before_tests, >> + name, >> + new_value); >> + goto out; >> + } >> + >> + /* Update fact */ >> + if (old_value != NULL && new_value != NULL) { >> + igt_info("[%s] [FACT %s] changed: %s: %s -> %s\n", >> + uptime, >> + last_test ? last_test : before_tests, >> + name, >> + old_value, >> + new_value); >> + goto out; >> + } >> + >> + /* Deleted fact */ >> + if (old_value != NULL && new_value == NULL) { >> + igt_info("[%s] [FACT %s] deleted: %s: %s\n", >> + uptime, >> + last_test ? last_test : before_tests, >> + name, >> + old_value); >> + goto out; >> + } >> + >> +out: >> + free(uptime); >> +} >> + >> +/** >> + * igt_facts_list_get: >> + * @name: name of the fact to be added >> + * @head: head of the list >> + * >> + * Get a fact from the list. >> + * >> + * Returns: pointer to the fact if found, NULL otherwise >> + * >> + */ >> +static igt_fact *igt_facts_list_get(const char *name, >> + struct igt_list_head *head) >> +{ >> + igt_fact *fact = NULL; >> + >> + if (igt_list_empty(head)) >> + return NULL; >> + >> + igt_list_for_each_entry(fact, head, link) { >> + if (strcmp(fact->name, name) == 0) >> + return fact; >> + } >> + return NULL; >> +} >> + >> +/** >> + * igt_facts_list_del: >> + * @name: name of the fact to be added >> + * @head: head of the list >> + * @last_test: name of the last test >> + * @log: bool indicating if the delete operation should be logged >> + * >> + * Delete a fact from the list. >> + * >> + * Returns: bool indicating if fact was deleted from the list >> + * >> + */ >> +static bool igt_facts_list_del(const char *name, >> + struct igt_list_head *head, >> + const char *last_test, >> + bool log) >> +{ >> + igt_fact *fact = NULL; >> + >> + if (igt_list_empty(head)) >> + return false; >> + >> + igt_list_for_each_entry(fact, head, link) { >> + if (strcmp(fact->name, name) == 0) { >> + if (log) >> + igt_facts_log(last_test, fact->name, >> + NULL, fact->value); >> + >> + igt_list_del(&fact->link); >> + free(fact->name); >> + free(fact->value); >> + free(fact->last_test); >> + free(fact); >> + return true; >> + } >> + } >> + return false; >> +} >> + >> +/** >> + * igt_facts_list_add: >> + * @name: name of the fact to be added >> + * @value: value of the fact to be added >> + * @last_test: name of the last test >> + * @head: head of the list >> + * >> + * Returns: bool indicating if fact was added to the list >> + * >> + */ >> +static bool igt_facts_list_add(const char *name, >> + const char *value, >> + const char *last_test, >> + struct igt_list_head *head) >> +{ >> + igt_fact *new_fact = NULL, *old_fact = NULL; >> + bool logged = false; >> + >> + if (name == NULL || value == NULL) >> + return false; >> + >> + old_fact = igt_facts_list_get(name, head); >> + if (old_fact) { >> + if (strcmp(old_fact->value, value) == 0) { >> + old_fact->present = true; >> + return false; >> + } >> + igt_facts_log(last_test, name, value, old_fact->value); >> + logged = true; >> + igt_facts_list_del(name, head, last_test, false); >> + } >> + >> + new_fact = malloc(sizeof(igt_fact)); >> + if (new_fact == NULL) >> + return false; >> + >> + new_fact->name = strdup(name); >> + new_fact->value = strdup(value); >> + new_fact->last_test = last_test ? strdup(last_test) : NULL; >> + new_fact->present = true; >> + >> + if (!logged) >> + igt_facts_log(last_test, name, value, NULL); >> + >> + igt_list_add(&new_fact->link, head); >> + >> + return true; >> +} >> + >> +/** >> + * igt_facts_list_mark: >> + * @head: head of the list >> + * >> + * Mark all facts in the list as not present. Opted for the mark and sweep >> + * design pattern due to its simplicity and efficiency. >> + * >> + * Returns: void >> + */ >> +static void igt_facts_list_mark(struct igt_list_head *head) >> +{ >> + igt_fact *fact = NULL; >> + >> + if (igt_list_empty(head)) >> + return; >> + >> + igt_list_for_each_entry(fact, head, link) >> + fact->present = false; >> +} >> + >> +/** >> + * igt_facts_list_sweep: >> + * @head: head of the list >> + * @last_test: name of the last test >> + * >> + * Sweep the list and delete all facts that are not present. Opted for the mark >> + * and sweep design pattern due to its simplicity and efficiency. >> + * >> + * Returns: void >> + */ >> +static void igt_facts_list_sweep(struct igt_list_head *head, >> + const char *last_test) >> +{ >> + igt_fact *fact = NULL, *tmp = NULL; >> + >> + if (igt_list_empty(head)) >> + return; >> + >> + igt_list_for_each_entry_safe(fact, tmp, head, link) >> + if (!fact->present) >> + igt_facts_list_del(fact->name, head, last_test, true); >> +} >> + >> +/** >> + * igt_facts_list_mark_and_sweep: >> + * @head: head of the list >> + * >> + * Clean up the list using mark and sweep. Opted for the mark and sweep >> + * design pattern due to its simplicity and efficiency. >> + * >> + * Returns: void >> + */ >> +static void igt_facts_list_mark_and_sweep(struct igt_list_head *head) >> +{ >> + igt_facts_list_mark(head); >> + igt_facts_list_sweep(head, NULL); >> +} >> + >> +/** >> + * igt_facts_are_all_lists_empty: >> + * >> + * Returns true if all lists are empty. Used by the tool lsfacts. >> + * >> + * Returns: bool >> + */ >> +bool igt_facts_are_all_lists_empty(void) >> +{ >> + return igt_list_empty(&igt_facts_list_drm_card_head) && >> + igt_list_empty(&igt_facts_list_kmod_head) && >> + igt_list_empty(&igt_facts_list_ktaint_head) && >> + igt_list_empty(&igt_facts_list_pci_gpu_head); >> +} >> + >> +/** >> + * igt_facts_scan_pci_gpus: >> + * @last_test: name of the last test >> + * >> + * This function scans the pci bus for gpus using udev. It uses >> + * igt_facts_list_mark(), igt_facts_list_add() and igt_facts_list_sweep() to >> + * update igt_facts_list_pci_gpu_head. >> + * >> + * Returns: void >> + */ >> +static void igt_facts_scan_pci_gpus(const char *last_test) >> +{ >> + static struct igt_list_head *head = &igt_facts_list_pci_gpu_head; >> + struct udev *udev = NULL; >> + struct udev_enumerate *enumerate = NULL; >> + struct udev_list_entry *devices, *dev_list_entry; >> + struct igt_device_card card; >> + char pcistr[10]; >> + int ret; >> + char *factname = NULL; >> + char *factvalue = NULL; >> + >> + udev = udev_new(); >> + if (!udev) { >> + igt_warn("Failed to create udev context\n"); >> + return; >> + } >> + >> + enumerate = udev_enumerate_new(udev); >> + if (!enumerate) { >> + igt_warn("Failed to create udev enumerate\n"); >> + udev_unref(udev); >> + return; >> + } >> + >> + ret = udev_enumerate_add_match_subsystem(enumerate, "pci"); >> + if (ret < 0) >> + goto out; >> + >> + ret = udev_enumerate_add_match_property(enumerate, >> + "PCI_CLASS", >> + "30000"); >> + if (ret < 0) >> + goto out; >> + >> + ret = udev_enumerate_add_match_property(enumerate, >> + "PCI_CLASS", >> + "38000"); >> + if (ret < 0) >> + goto out; >> + >> + ret = udev_enumerate_scan_devices(enumerate); >> + if (ret < 0) >> + goto out; >> + >> + devices = udev_enumerate_get_list_entry(enumerate); >> + if (!devices) >> + goto out; >> + >> + igt_facts_list_mark(head); >> + >> + udev_list_entry_foreach(dev_list_entry, devices) { >> + const char *path; >> + struct udev_device *udev_dev; >> + struct udev_list_entry *entry; >> + char *model = NULL; >> + char *codename = NULL; >> + igt_fact *old_fact = NULL; >> + >> + path = udev_list_entry_get_name(dev_list_entry); >> + udev_dev = udev_device_new_from_syspath(udev, path); >> + if (!udev_dev) >> + continue; >> + >> + /* Strip path to only the content after the last / */ >> + path = strrchr(path, '/'); >> + if (path) >> + path++; >> + else >> + path = "unknown"; >> + >> + strcpy(card.pci_slot_name, "-"); >> + >> + entry = udev_device_get_properties_list_entry(udev_dev); >> + while (entry) { >> + const char *name = udev_list_entry_get_name(entry); >> + const char *value = udev_list_entry_get_value(entry); >> + >> + entry = udev_list_entry_get_next(entry); >> + if (!strcmp(name, "ID_MODEL_FROM_DATABASE")) >> + model = strdup(value); >> + else if (!strcmp(name, "PCI_ID")) >> + igt_assert_eq(sscanf(value, "%hx:%hx", >> + &card.pci_vendor, >> + &card.pci_device), 2); >> + } >> + snprintf(pcistr, sizeof(pcistr), "%04x:%04x", >> + card.pci_vendor, card.pci_device); >> + codename = igt_device_get_pretty_name(&card, false); >> + >> + /* Set codename to null if it is the same string as pci_id */ >> + if (codename && strcmp(pcistr, codename) == 0) { >> + free(codename); >> + codename = NULL; >> + } >> + asprintf(&factname, "%s.%s", pci_gpu_fact, path); >> + asprintf(&factvalue, >> + "%s %s %s", >> + pcistr, >> + codename ? codename : "", >> + model ? model : ""); >> + >> + /** >> + * Loading and unloading the kmod may change the human >> + * readeable string in value. Do not change value if the >> + * pci id is the same. >> + */ >> + old_fact = igt_facts_list_get(factname, head); >> + if (old_fact && strncmp(old_fact->value, factvalue, 9) == 0) >> + old_fact->present = true; >> + else >> + igt_facts_list_add(factname, factvalue, last_test, head); >> + >> + free(codename); >> + free(model); >> + free(factname); >> + free(factvalue); >> + udev_device_unref(udev_dev); >> + } >> + >> + igt_facts_list_sweep(head, last_test); >> + >> +out: >> + udev_enumerate_unref(enumerate); >> + udev_unref(udev); >> +} >> + >> +/** >> + * igt_facts_scan_pci_drm_cards: >> + * @last_test: name of the last test >> + * >> + * This function scans the pci bus for drm cards using udev. It uses the >> + * igt_facts_list_mark(), igt_facts_list_add() and igt_facts_list_sweep() to >> + * update igt_facts_list_drm_card_head. >> + * >> + * Returns: void >> + */ >> +static void igt_facts_scan_pci_drm_cards(const char *last_test) >> +{ >> + static struct igt_list_head *head = &igt_facts_list_drm_card_head; >> + struct udev *udev = NULL; >> + struct udev_enumerate *enumerate = NULL; >> + struct udev_list_entry *devices, *dev_list_entry; >> + int ret; >> + char *factname = NULL; >> + char *factvalue = NULL; >> + >> + udev = udev_new(); >> + if (!udev) >> + return; >> + >> + enumerate = udev_enumerate_new(udev); >> + if (!enumerate) { >> + udev_unref(udev); >> + return; >> + } >> + >> + ret = udev_enumerate_add_match_subsystem(enumerate, "drm"); >> + if (ret < 0) >> + goto out; >> + >> + ret = udev_enumerate_scan_devices(enumerate); >> + if (ret < 0) >> + goto out; >> + >> + devices = udev_enumerate_get_list_entry(enumerate); >> + if (!devices) >> + goto out; >> + >> + ret = udev_enumerate_add_match_subsystem(enumerate, "drm"); >> + if (ret < 0) >> + goto out; >> + >> + ret = udev_enumerate_scan_devices(enumerate); >> + if (ret < 0) >> + goto out; >> + >> + devices = udev_enumerate_get_list_entry(enumerate); >> + if (!devices) >> + goto out; >> + >> + igt_facts_list_mark(head); >> + >> + udev_list_entry_foreach(dev_list_entry, devices) { >> + const char *path; >> + struct udev_device *drm_dev, *pci_dev; >> + const char *drm_name, *pci_addr; >> + >> + path = udev_list_entry_get_name(dev_list_entry); >> + drm_dev = udev_device_new_from_syspath(udev, path); >> + if (!drm_dev) >> + continue; >> + >> + drm_name = udev_device_get_sysname(drm_dev); >> + /* Filter the device by name. Want devices such as card0 and card1. >> + * If the device has '-' in the name, contine >> + */ >> + if (strncmp(drm_name, "card", 4) != 0 || >> + strchr(drm_name, '-') != NULL) { >> + udev_device_unref(drm_dev); >> + continue; >> + } >> + >> + /* Get the pci address of the gpu associated with the drm_dev*/ >> + pci_dev = udev_device_get_parent_with_subsystem_devtype(drm_dev, >> + "pci", >> + NULL); >> + if (pci_dev) { >> + pci_addr = udev_device_get_sysattr_value(pci_dev, >> + "address"); >> + if (!pci_addr) >> + pci_addr = udev_device_get_sysname(pci_dev); >> + } else { >> + /* Some GPUs are platform devices. Ignore them. */ >> + pci_addr = NULL; >> + udev_device_unref(drm_dev); >> + continue; >> + } >> + >> + asprintf(&factname, "%s.%s", drm_card_fact, pci_addr); >> + asprintf(&factvalue, "%s", drm_name); >> + >> + igt_facts_list_add(factname, factvalue, last_test, head); >> + >> + free(factname); >> + free(factvalue); >> + udev_device_unref(drm_dev); >> + } >> + >> + igt_facts_list_sweep(head, last_test); >> + >> +out: >> + udev_enumerate_unref(enumerate); >> + udev_unref(udev); >> +} >> + >> +/** >> + * igt_facts_scan_kernel_taints: >> + * @last_test: name of the last test >> + * >> + * This function scans for kernel taints using igt_kernel_tainted() and >> + * igt_explain_taints(). It will cut off the explanation keeping only the >> + * taint name. >> + * >> + * Returns: void >> + */ >> +static void igt_facts_scan_kernel_taints(const char *last_test) >> +{ >> + static struct igt_list_head *head = &igt_facts_list_ktaint_head; >> + unsigned long taints = 0; >> + const char *reason = NULL; >> + char *taint_name = NULL; >> + char *fact_name = NULL; >> + >> + taints = igt_kernel_tainted(&taints); >> + /* For testing, set all bits to 1 >> + * taints = 0xFFFFFFFF; >> + */ >> + >> + >> + igt_facts_list_mark(head); >> + >> + while ((reason = igt_explain_taints(&taints)) != NULL) { >> + /* Cut at the ':' to get only the taint name */ >> + taint_name = strtok(strdup(reason), ":"); >> + if (!taint_name) >> + continue; >> + >> + /* Lowercase taint_name */ >> + for (int i = 0; taint_name[i]; i++) >> + taint_name[i] = tolower(taint_name[i]); >> + >> + asprintf(&fact_name, "%s.%s", ktaint_fact, taint_name); >> + igt_facts_list_add(fact_name, "true", last_test, head); >> + >> + free(taint_name); >> + free(fact_name); >> + } >> + >> + igt_facts_list_sweep(head, last_test); >> +} >> + >> + >> +/** >> + * igt_facts_scan_kernel_loaded_kmods: >> + * @last_test: name of the last test >> + * >> + * This function scans for loaded kmods using igt_fact_kmod_list and >> + * igt_kmod_is_loaded(). >> + * >> + * Returns: void >> + */ >> +static void igt_facts_scan_kernel_loaded_kmods(const char *last_test) >> +{ >> + static struct igt_list_head *head = &igt_facts_list_kmod_head; >> + char *name = NULL; >> + >> + igt_facts_list_mark(head); >> + >> + /* Iterate over igt_fact_kmod_list[] until the element contains "\0" */ >> + for (int i = 0; strcmp(igt_fact_kmod_list[i], "\0") != 0; i++) { >> + asprintf(&name, "%s.%s", kmod_fact, igt_fact_kmod_list[i]); >> + if (igt_kmod_is_loaded(igt_fact_kmod_list[i])) >> + igt_facts_list_add(name, "true", last_test, head); >> + >> + free(name); >> + } >> + >> + igt_facts_list_sweep(head, last_test); >> +} >> + >> +/** >> + * igt_facts: >> + * @last_test: name of the last test >> + * >> + * Call this function where you want to gather and report facts. >> + * >> + * Returns: void >> + */ >> +void igt_facts(const char *last_test) >> +{ >> + igt_facts_scan_pci_gpus(last_test); >> + igt_facts_scan_pci_drm_cards(last_test); >> + igt_facts_scan_kernel_taints(last_test); >> + igt_facts_scan_kernel_loaded_kmods(last_test); >> + >> + fflush(stdout); >> + fflush(stderr); >> +} >> + >> +/* >> + * Testing >> + * >> + * Defined here to keep most of the functions static >> + * >> + */ >> + >> +/** >> + * igt_facts_test_add_get: >> + * @head: head of the list >> + * >> + * Tests igt_facts_list_add and igt_facts_list_get. >> + * >> + * Returns: void >> + */ >> +static void igt_facts_test_add_get(struct igt_list_head *head) >> +{ >> + igt_fact *fact = NULL; >> + bool ret; >> + const char *name = "hardware.pci.gpu_at_addr.0000:00:02.0"; >> + const char *value = "8086:64a0 Intel Lunarlake (Gen20)"; >> + const char *last_test = NULL; >> + >> + ret = igt_facts_list_add(name, value, last_test, head); >> + igt_assert(ret == true); >> + >> + // Assert that there is one element in the linked list >> + igt_assert_eq(igt_list_length(head), 1); >> + >> + // Assert that the element in the linked list is the one we added >> + fact = igt_facts_list_get(name, head); >> + igt_assert(fact != NULL); >> + igt_assert_eq(strcmp(fact->name, name), 0); >> + igt_assert_eq(strcmp(fact->value, value), 0); >> + igt_assert(fact->present == true); >> + igt_assert(fact->last_test == NULL); >> +} >> + >> +/** >> + * igt_facts_test_mark_and_sweep: >> + * @head: head of the list >> + * >> + * - Add 3 elements to the list and mark them as not present. >> + * - Update two of the elements and mark them as present. >> + * - Sweep the list and assert that >> + * - Only the two updated elements are present >> + * - The third element was deleted >> + * >> + * Returns: void >> + */ >> +static void igt_facts_test_mark_and_sweep(struct igt_list_head *head) >> +{ >> + igt_fact *fact = NULL; >> + const char *name1 = "hardware.pci.gpu_at_addr.0000:00:02.0"; >> + const char *value1 = "8086:64a0 Intel Lunarlake (Gen20)"; >> + const char *name2 = "hardware.pci.gpu_at_addr.0000:00:03.0"; >> + const char *value2 = "8086:64a1 Intel Lunarlake (Gen21)"; >> + const char *name3 = "hardware.pci.gpu_at_addr.0000:00:04.0"; >> + const char *value3 = "8086:64a2 Intel Lunarlake (Gen22)"; >> + >> + igt_facts_list_add(name1, value1, NULL, head); >> + igt_facts_list_add(name2, value2, NULL, head); >> + igt_facts_list_add(name3, value3, NULL, head); >> + >> + igt_facts_list_mark(head); >> + >> + igt_facts_list_add(name1, value1, NULL, head); >> + igt_facts_list_add(name2, value2, NULL, head); >> + >> + igt_facts_list_sweep(head, NULL); >> + >> + // Assert that there are two elements in the linked list >> + igt_assert_eq(igt_list_length(head), 2); >> + >> + // Assert that the two updated elements are present >> + fact = igt_facts_list_get(name1, head); >> + igt_assert(fact != NULL); >> + igt_assert(fact->present == true); >> + >> + fact = igt_facts_list_get(name2, head); >> + igt_assert(fact != NULL); >> + igt_assert(fact->present == true); >> + >> + // Assert that the third element was deleted >> + fact = igt_facts_list_get(name3, head); >> + igt_assert(fact == NULL); >> +} >> + >> +/** >> + * igt_facts_test: >> + * >> + * Main function for testing the igt_facts module >> + * >> + * Returns: bool indicating if the tests passed >> + */ >> +void igt_facts_test(void) >> +{ >> + const char *last_test = "Unit Testing"; >> + >> + igt_facts_lists_init(); >> + >> + /* Assert that all lists are empty */ >> + igt_assert(igt_list_empty(&igt_facts_list_kmod_head)); >> + igt_assert(igt_list_empty(&igt_facts_list_ktaint_head)); >> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head)); >> + igt_assert(igt_list_empty(&igt_facts_list_drm_card_head)); >> + >> + /* Assert that add and get work. Will add one element to the list */ >> + igt_facts_test_add_get(&igt_facts_list_pci_gpu_head); >> + >> + /* Assert that igt_facts_list_mark_and_sweep() cleans up the list */ >> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == false); >> + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head); >> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == true); >> + >> + /* Test the mark and sweep pattern used to delete elements >> + * from the list >> + */ >> + igt_facts_test_mark_and_sweep(&igt_facts_list_pci_gpu_head); >> + >> + /* Clean up the list and call igt_facts(). This should not crash */ >> + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head); >> + igt_facts(last_test); >> +} >> diff --git a/lib/igt_facts.h b/lib/igt_facts.h >> new file mode 100644 >> index 000000000..e4adca3fb >> --- /dev/null >> +++ b/lib/igt_facts.h >> @@ -0,0 +1,47 @@ >> +/* SPDX-License-Identifier: MIT >> + * Copyright © 2024 Intel Corporation >> + */ >> + >> +#include <stdbool.h> >> + >> +#include "igt_list.h" >> + >> + >> +/* igt_fact: >> + * @name: name of the fact >> + * @value: value of the fact >> + * @last_test: name of the test that triggered the fact >> + * @present: bool indicating if fact is present. Used for deleting facts from >> + * the list. >> + * @link: link to the next fact >> + * >> + * A fact is a piece of information that can be used to determine the state of >> + * the system. >> + * >> + */ >> +typedef struct { >> + char *name; >> + char *value; >> + char *last_test; >> + bool present; /* For mark and seep */ >> + struct igt_list_head link; >> +} igt_fact; >> + >> +const char *igt_fact_kmod_list[] = { >> + "amdgpu", >> + "i915", >> + "nouveau", >> + "radeon", >> + "xe", >> + "\0" >> +}; >> + >> +const char *kmod_fact = "kernel.kmod_is_loaded"; /* true or false */ >> +const char *ktaint_fact = "kernel.is_tainted"; /* taint name: taint_warn */ >> +const char *pci_gpu_fact = "hardware.pci.gpu_at_addr"; /* id vendor model */ >> +const char *drm_card_fact = "hardware.pci.drm_card_at_addr"; /* cardX */ >> + >> +void igt_facts_lists_init(void); >> +void igt_facts(const char *last_test); >> +bool igt_facts_are_all_lists_empty(void); >> +void igt_facts_test(void); /* For unit testing only */ >> diff --git a/lib/meson.build b/lib/meson.build >> index c3556a921..c44ca2b5a 100644 >> --- a/lib/meson.build >> +++ b/lib/meson.build >> @@ -18,6 +18,7 @@ lib_sources = [ >> 'i915/i915_crc.c', >> 'igt_collection.c', >> 'igt_color_encoding.c', >> + 'igt_facts.c', >> 'igt_crc.c', >> 'igt_debugfs.c', >> 'igt_device.c', >> diff --git a/lib/tests/igt_facts.c b/lib/tests/igt_facts.c >> new file mode 100644 >> index 000000000..7fa9d0f22 >> --- /dev/null >> +++ b/lib/tests/igt_facts.c >> @@ -0,0 +1,15 @@ >> +// SPDX-License-Identifier: MIT >> +// Copyright © 2024 Intel Corporation >> + >> +#include <stdbool.h> >> + >> +#include "igt_core.h" >> +#include "igt_facts.h" >> + >> +/* Tests are not defined here so we can keep most of the functions static */ >> + >> +igt_simple_main >> +{ >> + igt_info("Running igt_facts_test\n"); >> + igt_facts_test(); >> +} >> diff --git a/lib/tests/meson.build b/lib/tests/meson.build >> index df8092638..1ce19f63c 100644 >> --- a/lib/tests/meson.build >> +++ b/lib/tests/meson.build >> @@ -8,6 +8,7 @@ lib_tests = [ >> 'igt_dynamic_subtests', >> 'igt_edid', >> 'igt_exit_handler', >> + 'igt_facts', >> 'igt_fork', >> 'igt_fork_helper', >> 'igt_hook', >> diff --git a/runner/executor.c b/runner/executor.c >> index ac73e1dde..d1eca3c05 100644 >> --- a/runner/executor.c >> +++ b/runner/executor.c >> @@ -30,6 +30,7 @@ >> >> #include "igt_aux.h" >> #include "igt_core.h" >> +#include "igt_facts.h" >> #include "igt_taints.h" >> #include "igt_vec.h" >> #include "executor.h" >> @@ -2306,6 +2307,9 @@ bool execute(struct execute_state *state, >> sigset_t sigmask; >> double time_spent = 0.0; >> bool status = true; >> + char *last_test = NULL; >> + >> + igt_facts_lists_init(); >> >> if (state->dry) { >> outf("Dry run, not executing. Invoke igt_resume if you want to execute.\n"); >> @@ -2438,6 +2442,10 @@ bool execute(struct execute_state *state, >> int result; >> bool already_written = false; >> >> + /* Calls before running each test */ >> + igt_facts(last_test); >> + last_test = entry_display_name(&job_list->entries[state->next]); >> + >> if (should_die_because_signal(sigfd)) { >> status = false; >> goto end; >> @@ -2526,6 +2534,8 @@ bool execute(struct execute_state *state, >> return execute(state, settings, job_list); >> } >> } >> + /* Last call to collect facts after the last test runs */ >> + igt_facts(last_test); >> >> if ((timefd = openat(resdirfd, "endtime.txt", O_CREAT | O_WRONLY | O_EXCL, 0666)) >= 0) { >> dprintf(timefd, "%f\n", timeofday_double()); >> diff --git a/tools/lsfacts.c b/tools/lsfacts.c >> new file mode 100644 >> index 000000000..10dee0317 >> --- /dev/null >> +++ b/tools/lsfacts.c >> @@ -0,0 +1,25 @@ >> +// SPDX-License-Identifier: MIT >> +// Copyright © 2024 Intel Corporation >> + >> +#include "igt.h" >> +#include "igt_facts.h" >> + >> +/** >> + * SECTION:lsfacts >> + * @short_description: lsfacts >> + * @title: lsfacts >> + * @include: lsfacts.c >> + * >> + * # lsfacts >> + * >> + * Scan for igt-facts and print them on screen. Indicate if no facts are found. >> + */ >> +int main(int argc, char *argv[]) >> +{ >> + igt_facts_lists_init(); >> + >> + igt_facts("lsfacts"); >> + >> + if (igt_facts_are_all_lists_empty()) >> + igt_info("No facts found...\n"); >> +} >> diff --git a/tools/meson.build b/tools/meson.build >> index 48c9a4b50..ff1b0ef90 100644 >> --- a/tools/meson.build >> +++ b/tools/meson.build >> @@ -42,6 +42,7 @@ tools_progs = [ >> 'intel_gem_info', >> 'intel_gvtg_test', >> 'dpcd_reg', >> + 'lsfacts', >> 'lsgpu', >> 'power', >> ] >> > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking 2024-12-06 5:45 ` Peter Senna Tschudin @ 2024-12-09 9:17 ` Janusz Krzysztofik 2024-12-09 11:06 ` Peter Senna Tschudin 0 siblings, 1 reply; 13+ messages in thread From: Janusz Krzysztofik @ 2024-12-09 9:17 UTC (permalink / raw) To: igt-dev@lists.freedesktop.org, Peter Senna Tschudin Cc: Ryszard Knop, Zbigniew Kempczyński, Lucas De Marchi, luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray, dominik.karol.piatkowski, katarzyna.piecielska, Janusz Krzysztofik On Friday, 6 December 2024 06:45:31 CET Peter Senna Tschudin wrote: > Hi Janusz, > > Thank you for your detailed comments. I appreciate the opportunity > to clarify and address your concerns. > > On 05.12.2024 15:05, Janusz Krzysztofik wrote: > > Hi Peter, > > > > On Wednesday, 4 December 2024 19:44:53 CET Peter Senna Tschudin wrote: > >> When using igt-runner, collect facts before each test and after the > >> last test, and report when facts change. The facts are: > >> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20) > >> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1 > >> - Kernel taints: kernel.is_tainted.taint_warn: true > >> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true > >> > >> This change imposes little execution overhead and adds just a few > >> lines of logging. The facts will be printed on normal igt-runner > >> output. Here is a real example from our CI shwoing > >> hotreplug-lateclose changing the DRM card number > > > > Since you give that as an example of how helpful your facts can be, and follow > > that with a kernel taint example, that may indicate you think, and users of > > your facts may then be mislead having that read, that the taint was related to > > the change of card number, while both had nothing to do with each other. > > Let’s take a step back to define the purpose and scope of igt-facts: > - Definition of a fact from the dictionary: A fact is an objectively verifiable > piece of information. > - Purpose of igt-facts: Track which tests cause changes to the facts. > > The operation is straightforward: facts are collected before and after each test, > and any differences are logged. Here’s an example showing a fact change and a new > fact after running hotreplug-lateclose: > > [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1 > [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true > > This output highlights the facts without implying causation between them. The > tool(and my commit message) neither explains relationships between facts nor > misleads users into assuming causation. For me your commit message does. Can you please provide a full list of "facts" your code is supposed to handle? Can you please explain why you selected just those "facts", not others? Thanks, Janusz > > > > > Please add something like ', which is expected,' to your description. Changed > > card number is expected, and that's nothing wrong with that. The old, still > > open instance of the driver still exists. It is expected to be already > > decoupled from hardware, but it still occupies its minor device number. Then, > > new instance of the driver, attached to the same hardware, gets first free > > minor number. Refresh of IGT device filter before health check can perfectly > > handle that case. > > The igt-facts tool reports changes without qualifying them as expected or > unexpected. For example, the change in the DRM card number is logged as a fact, > irrespective of its expected nature. > > > > >> and tainting the > >> kernel on the abort path: > > > > No, hotreplug-lateclose doesn't taint the kernel. That's wrong conclusion from > > what was reported, or wrong wording at least. Kernel taint (or lockdep not > > active) must have been a result of driver issues (or maybe well known > > limitations) exposed by the test. Then igt_runner took the abort path since > > it detected those unhealthy conditions. Again, users of your facts may be > > mislead if your messages can really suggest what you tell us about what you > > think has happened. > > The kernel taint reported is a fact observed after the test. While its root cause > lies within the driver or other system components, this is outside the scope of > igt-facts. The report simply reflects what changed during the test. > > To summarize: > - igt-facts is a factual reporting tool. It does not establish causation or > interpret changes. > - Both the DRM card number change and kernel taint were factual observations post > hotreplug-lateclose. > > I hope this clarifies the intent and operation of igt-facts. Please let me know if > further discussion is needed. > > Thank you, > > Peter > > > > The whole idea of testing i915 resistance to hot unplug scenarios came from > > perceived cases of VFs potentially disappearing from VMs. Some significant > > effort was put on that feature of the driver, but since the 'hot' test > > variants were never unblocked in CI, things tended to get worse and worse over > > time while new driver features that didn't care for hot unplug capability were > > added. > > > > Thanks, > > Janusz > > > >> > >> [245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose) > >> [245.383596] Starting subtest: hotreplug-lateclose > >> [249.843361] Aborting: Lockdep not active > >> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1 > >> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true > >> [249.859075] Closing watchdogs > >> > >> CC: Ryszard Knop <ryszard.knop@intel.com> > >> CC: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> > >> CC: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> > >> CC: Lucas De Marchi <lucas.demarchi@intel.com> > >> CC: luciano.coelho@intel.com > >> CC: nirmoy.das@intel.com > >> CC: stuart.summers@intel.com > >> CC: himal.prasad.ghimiray@intel.com > >> CC: dominik.karol.piatkowski@intel.com > >> CC: katarzyna.piecielska@intel.com > >> Signed-off-by: Peter Senna Tschudin <peter.senna@linux.intel.com> > >> --- > >> v10: > >> - fix memory leaks from asprintf (Thank you Dominik Karol!) > >> - fix comments for consistency (Thank you Dominik Karol!) > >> > >> v9: > >> - do not report new hardware when loading/unloading kmod changes > >> the string of the GPU name. I accidentally reintroduced this > >> issue when refactoring to use linked lists. > >> - add tools/lsfacts: 9 lines of code that print either the facts > >> or that no facts were found. > >> - fix code comments describing functions > >> - fix white space issues > >> > >> v8: > >> - fix white space issues > >> > >> v7: > >> - refactor to use linked lists provided by igt_lists > >> - Added function arguments to code comments > >> - updated commit message > >> > >> v6: > >> - sort includes in igt_facts.c alphabetically > >> - add facts for kernel taints using igt_kernel_tainted() and > >> igt_explain_taints() > >> > >> v5: > >> - fix the broken patch format from v4 > >> > >> v4: > >> - fix a bug on delete_fact() > >> - drop glib and calls to g_ functions > >> - change commit message to indicate that report only on fact changes > >> - use consistent format for reporting changes > >> - fix SPDX header format > >> > >> v3: > >> - refreshed commit message > >> - changed format SPDX string > >> - removed license text > >> - replace last_test assignment when null by two ternary operators > >> - added function descriptions following example found elsewhere in > >> the code > >> - added igt_assert to catch failures to realloc() > >> > >> v2: > >> - add lib/tests/igt_facts.c for basic unit testing > >> - bugfix: do not report a new gpu when the driver changes the gpu name > >> - bugfix: do not report the pci_id twice on the gpu name > >> > >> lib/igt_facts.c | 755 ++++++++++++++++++++++++++++++++++++++++++ > >> lib/igt_facts.h | 47 +++ > >> lib/meson.build | 1 + > >> lib/tests/igt_facts.c | 15 + > >> lib/tests/meson.build | 1 + > >> runner/executor.c | 10 + > >> tools/lsfacts.c | 25 ++ > >> tools/meson.build | 1 + > >> 8 files changed, 855 insertions(+) > >> create mode 100644 lib/igt_facts.c > >> create mode 100644 lib/igt_facts.h > >> create mode 100644 lib/tests/igt_facts.c > >> create mode 100644 tools/lsfacts.c > >> > >> diff --git a/lib/igt_facts.c b/lib/igt_facts.c > >> new file mode 100644 > >> index 000000000..4749d3417 > >> --- /dev/null > >> +++ b/lib/igt_facts.c > >> @@ -0,0 +1,755 @@ > >> +// SPDX-License-Identifier: MIT > >> +// Copyright © 2024 Intel Corporation > >> + > >> +#include <ctype.h> > >> +#include <libudev.h> > >> +#include <stdio.h> > >> +#include <sys/time.h> > >> +#include <time.h> > >> + > >> +#include "igt_core.h" > >> +#include "igt_device_scan.h" > >> +#include "igt_facts.h" > >> +#include "igt_kmod.h" > >> +#include "igt_list.h" > >> +#include "igt_taints.h" > >> + > >> +static struct igt_list_head igt_facts_list_drm_card_head; > >> +static struct igt_list_head igt_facts_list_kmod_head; > >> +static struct igt_list_head igt_facts_list_ktaint_head; > >> +static struct igt_list_head igt_facts_list_pci_gpu_head; > >> + > >> + > >> +/** > >> + * igt_facts_lists_init: > >> + * > >> + * Initialize igt_facts linked lists. > >> + * > >> + * Returns: void > >> + */ > >> +void igt_facts_lists_init(void) > >> +{ > >> + IGT_INIT_LIST_HEAD(&igt_facts_list_drm_card_head); > >> + IGT_INIT_LIST_HEAD(&igt_facts_list_kmod_head); > >> + IGT_INIT_LIST_HEAD(&igt_facts_list_ktaint_head); > >> + IGT_INIT_LIST_HEAD(&igt_facts_list_pci_gpu_head); > >> +} > >> + > >> + > >> +/** > >> + * igt_facts_log: > >> + * @last_test: name of the test that triggered the fact > >> + * @name: name of the fact > >> + * @new_value: new value of the fact > >> + * @old_value: old value of the fact > >> + * > >> + * Reports fact changes: > >> + * - new fact: if old_value is NULL and new_value is not NULL > >> + * - deleted fact: if new_value is NULL and old_value is not NULL > >> + * - changed fact: if new_value is different from old_value > >> + * > >> + * Returns: void > >> + */ > >> +static void igt_facts_log(const char *last_test, const char *name, > >> + const char *new_value, const char *old_value) > >> +{ > >> + struct timespec uptime_ts; > >> + char *uptime = NULL; > >> + const char *before_tests = "before any test"; > >> + > >> + if (old_value == NULL && new_value == NULL) > >> + return; > >> + > >> + if (clock_gettime(CLOCK_BOOTTIME, &uptime_ts) != 0) > >> + return; > >> + > >> + asprintf(&uptime, > >> + "%ld.%06ld", > >> + uptime_ts.tv_sec, > >> + uptime_ts.tv_nsec / 1000); > >> + > >> + /* New fact */ > >> + if (old_value == NULL && new_value != NULL) { > >> + igt_info("[%s] [FACT %s] new: %s: %s\n", > >> + uptime, > >> + last_test ? last_test : before_tests, > >> + name, > >> + new_value); > >> + goto out; > >> + } > >> + > >> + /* Update fact */ > >> + if (old_value != NULL && new_value != NULL) { > >> + igt_info("[%s] [FACT %s] changed: %s: %s -> %s\n", > >> + uptime, > >> + last_test ? last_test : before_tests, > >> + name, > >> + old_value, > >> + new_value); > >> + goto out; > >> + } > >> + > >> + /* Deleted fact */ > >> + if (old_value != NULL && new_value == NULL) { > >> + igt_info("[%s] [FACT %s] deleted: %s: %s\n", > >> + uptime, > >> + last_test ? last_test : before_tests, > >> + name, > >> + old_value); > >> + goto out; > >> + } > >> + > >> +out: > >> + free(uptime); > >> +} > >> + > >> +/** > >> + * igt_facts_list_get: > >> + * @name: name of the fact to be added > >> + * @head: head of the list > >> + * > >> + * Get a fact from the list. > >> + * > >> + * Returns: pointer to the fact if found, NULL otherwise > >> + * > >> + */ > >> +static igt_fact *igt_facts_list_get(const char *name, > >> + struct igt_list_head *head) > >> +{ > >> + igt_fact *fact = NULL; > >> + > >> + if (igt_list_empty(head)) > >> + return NULL; > >> + > >> + igt_list_for_each_entry(fact, head, link) { > >> + if (strcmp(fact->name, name) == 0) > >> + return fact; > >> + } > >> + return NULL; > >> +} > >> + > >> +/** > >> + * igt_facts_list_del: > >> + * @name: name of the fact to be added > >> + * @head: head of the list > >> + * @last_test: name of the last test > >> + * @log: bool indicating if the delete operation should be logged > >> + * > >> + * Delete a fact from the list. > >> + * > >> + * Returns: bool indicating if fact was deleted from the list > >> + * > >> + */ > >> +static bool igt_facts_list_del(const char *name, > >> + struct igt_list_head *head, > >> + const char *last_test, > >> + bool log) > >> +{ > >> + igt_fact *fact = NULL; > >> + > >> + if (igt_list_empty(head)) > >> + return false; > >> + > >> + igt_list_for_each_entry(fact, head, link) { > >> + if (strcmp(fact->name, name) == 0) { > >> + if (log) > >> + igt_facts_log(last_test, fact->name, > >> + NULL, fact->value); > >> + > >> + igt_list_del(&fact->link); > >> + free(fact->name); > >> + free(fact->value); > >> + free(fact->last_test); > >> + free(fact); > >> + return true; > >> + } > >> + } > >> + return false; > >> +} > >> + > >> +/** > >> + * igt_facts_list_add: > >> + * @name: name of the fact to be added > >> + * @value: value of the fact to be added > >> + * @last_test: name of the last test > >> + * @head: head of the list > >> + * > >> + * Returns: bool indicating if fact was added to the list > >> + * > >> + */ > >> +static bool igt_facts_list_add(const char *name, > >> + const char *value, > >> + const char *last_test, > >> + struct igt_list_head *head) > >> +{ > >> + igt_fact *new_fact = NULL, *old_fact = NULL; > >> + bool logged = false; > >> + > >> + if (name == NULL || value == NULL) > >> + return false; > >> + > >> + old_fact = igt_facts_list_get(name, head); > >> + if (old_fact) { > >> + if (strcmp(old_fact->value, value) == 0) { > >> + old_fact->present = true; > >> + return false; > >> + } > >> + igt_facts_log(last_test, name, value, old_fact->value); > >> + logged = true; > >> + igt_facts_list_del(name, head, last_test, false); > >> + } > >> + > >> + new_fact = malloc(sizeof(igt_fact)); > >> + if (new_fact == NULL) > >> + return false; > >> + > >> + new_fact->name = strdup(name); > >> + new_fact->value = strdup(value); > >> + new_fact->last_test = last_test ? strdup(last_test) : NULL; > >> + new_fact->present = true; > >> + > >> + if (!logged) > >> + igt_facts_log(last_test, name, value, NULL); > >> + > >> + igt_list_add(&new_fact->link, head); > >> + > >> + return true; > >> +} > >> + > >> +/** > >> + * igt_facts_list_mark: > >> + * @head: head of the list > >> + * > >> + * Mark all facts in the list as not present. Opted for the mark and sweep > >> + * design pattern due to its simplicity and efficiency. > >> + * > >> + * Returns: void > >> + */ > >> +static void igt_facts_list_mark(struct igt_list_head *head) > >> +{ > >> + igt_fact *fact = NULL; > >> + > >> + if (igt_list_empty(head)) > >> + return; > >> + > >> + igt_list_for_each_entry(fact, head, link) > >> + fact->present = false; > >> +} > >> + > >> +/** > >> + * igt_facts_list_sweep: > >> + * @head: head of the list > >> + * @last_test: name of the last test > >> + * > >> + * Sweep the list and delete all facts that are not present. Opted for the mark > >> + * and sweep design pattern due to its simplicity and efficiency. > >> + * > >> + * Returns: void > >> + */ > >> +static void igt_facts_list_sweep(struct igt_list_head *head, > >> + const char *last_test) > >> +{ > >> + igt_fact *fact = NULL, *tmp = NULL; > >> + > >> + if (igt_list_empty(head)) > >> + return; > >> + > >> + igt_list_for_each_entry_safe(fact, tmp, head, link) > >> + if (!fact->present) > >> + igt_facts_list_del(fact->name, head, last_test, true); > >> +} > >> + > >> +/** > >> + * igt_facts_list_mark_and_sweep: > >> + * @head: head of the list > >> + * > >> + * Clean up the list using mark and sweep. Opted for the mark and sweep > >> + * design pattern due to its simplicity and efficiency. > >> + * > >> + * Returns: void > >> + */ > >> +static void igt_facts_list_mark_and_sweep(struct igt_list_head *head) > >> +{ > >> + igt_facts_list_mark(head); > >> + igt_facts_list_sweep(head, NULL); > >> +} > >> + > >> +/** > >> + * igt_facts_are_all_lists_empty: > >> + * > >> + * Returns true if all lists are empty. Used by the tool lsfacts. > >> + * > >> + * Returns: bool > >> + */ > >> +bool igt_facts_are_all_lists_empty(void) > >> +{ > >> + return igt_list_empty(&igt_facts_list_drm_card_head) && > >> + igt_list_empty(&igt_facts_list_kmod_head) && > >> + igt_list_empty(&igt_facts_list_ktaint_head) && > >> + igt_list_empty(&igt_facts_list_pci_gpu_head); > >> +} > >> + > >> +/** > >> + * igt_facts_scan_pci_gpus: > >> + * @last_test: name of the last test > >> + * > >> + * This function scans the pci bus for gpus using udev. It uses > >> + * igt_facts_list_mark(), igt_facts_list_add() and igt_facts_list_sweep() to > >> + * update igt_facts_list_pci_gpu_head. > >> + * > >> + * Returns: void > >> + */ > >> +static void igt_facts_scan_pci_gpus(const char *last_test) > >> +{ > >> + static struct igt_list_head *head = &igt_facts_list_pci_gpu_head; > >> + struct udev *udev = NULL; > >> + struct udev_enumerate *enumerate = NULL; > >> + struct udev_list_entry *devices, *dev_list_entry; > >> + struct igt_device_card card; > >> + char pcistr[10]; > >> + int ret; > >> + char *factname = NULL; > >> + char *factvalue = NULL; > >> + > >> + udev = udev_new(); > >> + if (!udev) { > >> + igt_warn("Failed to create udev context\n"); > >> + return; > >> + } > >> + > >> + enumerate = udev_enumerate_new(udev); > >> + if (!enumerate) { > >> + igt_warn("Failed to create udev enumerate\n"); > >> + udev_unref(udev); > >> + return; > >> + } > >> + > >> + ret = udev_enumerate_add_match_subsystem(enumerate, "pci"); > >> + if (ret < 0) > >> + goto out; > >> + > >> + ret = udev_enumerate_add_match_property(enumerate, > >> + "PCI_CLASS", > >> + "30000"); > >> + if (ret < 0) > >> + goto out; > >> + > >> + ret = udev_enumerate_add_match_property(enumerate, > >> + "PCI_CLASS", > >> + "38000"); > >> + if (ret < 0) > >> + goto out; > >> + > >> + ret = udev_enumerate_scan_devices(enumerate); > >> + if (ret < 0) > >> + goto out; > >> + > >> + devices = udev_enumerate_get_list_entry(enumerate); > >> + if (!devices) > >> + goto out; > >> + > >> + igt_facts_list_mark(head); > >> + > >> + udev_list_entry_foreach(dev_list_entry, devices) { > >> + const char *path; > >> + struct udev_device *udev_dev; > >> + struct udev_list_entry *entry; > >> + char *model = NULL; > >> + char *codename = NULL; > >> + igt_fact *old_fact = NULL; > >> + > >> + path = udev_list_entry_get_name(dev_list_entry); > >> + udev_dev = udev_device_new_from_syspath(udev, path); > >> + if (!udev_dev) > >> + continue; > >> + > >> + /* Strip path to only the content after the last / */ > >> + path = strrchr(path, '/'); > >> + if (path) > >> + path++; > >> + else > >> + path = "unknown"; > >> + > >> + strcpy(card.pci_slot_name, "-"); > >> + > >> + entry = udev_device_get_properties_list_entry(udev_dev); > >> + while (entry) { > >> + const char *name = udev_list_entry_get_name(entry); > >> + const char *value = udev_list_entry_get_value(entry); > >> + > >> + entry = udev_list_entry_get_next(entry); > >> + if (!strcmp(name, "ID_MODEL_FROM_DATABASE")) > >> + model = strdup(value); > >> + else if (!strcmp(name, "PCI_ID")) > >> + igt_assert_eq(sscanf(value, "%hx:%hx", > >> + &card.pci_vendor, > >> + &card.pci_device), 2); > >> + } > >> + snprintf(pcistr, sizeof(pcistr), "%04x:%04x", > >> + card.pci_vendor, card.pci_device); > >> + codename = igt_device_get_pretty_name(&card, false); > >> + > >> + /* Set codename to null if it is the same string as pci_id */ > >> + if (codename && strcmp(pcistr, codename) == 0) { > >> + free(codename); > >> + codename = NULL; > >> + } > >> + asprintf(&factname, "%s.%s", pci_gpu_fact, path); > >> + asprintf(&factvalue, > >> + "%s %s %s", > >> + pcistr, > >> + codename ? codename : "", > >> + model ? model : ""); > >> + > >> + /** > >> + * Loading and unloading the kmod may change the human > >> + * readeable string in value. Do not change value if the > >> + * pci id is the same. > >> + */ > >> + old_fact = igt_facts_list_get(factname, head); > >> + if (old_fact && strncmp(old_fact->value, factvalue, 9) == 0) > >> + old_fact->present = true; > >> + else > >> + igt_facts_list_add(factname, factvalue, last_test, head); > >> + > >> + free(codename); > >> + free(model); > >> + free(factname); > >> + free(factvalue); > >> + udev_device_unref(udev_dev); > >> + } > >> + > >> + igt_facts_list_sweep(head, last_test); > >> + > >> +out: > >> + udev_enumerate_unref(enumerate); > >> + udev_unref(udev); > >> +} > >> + > >> +/** > >> + * igt_facts_scan_pci_drm_cards: > >> + * @last_test: name of the last test > >> + * > >> + * This function scans the pci bus for drm cards using udev. It uses the > >> + * igt_facts_list_mark(), igt_facts_list_add() and igt_facts_list_sweep() to > >> + * update igt_facts_list_drm_card_head. > >> + * > >> + * Returns: void > >> + */ > >> +static void igt_facts_scan_pci_drm_cards(const char *last_test) > >> +{ > >> + static struct igt_list_head *head = &igt_facts_list_drm_card_head; > >> + struct udev *udev = NULL; > >> + struct udev_enumerate *enumerate = NULL; > >> + struct udev_list_entry *devices, *dev_list_entry; > >> + int ret; > >> + char *factname = NULL; > >> + char *factvalue = NULL; > >> + > >> + udev = udev_new(); > >> + if (!udev) > >> + return; > >> + > >> + enumerate = udev_enumerate_new(udev); > >> + if (!enumerate) { > >> + udev_unref(udev); > >> + return; > >> + } > >> + > >> + ret = udev_enumerate_add_match_subsystem(enumerate, "drm"); > >> + if (ret < 0) > >> + goto out; > >> + > >> + ret = udev_enumerate_scan_devices(enumerate); > >> + if (ret < 0) > >> + goto out; > >> + > >> + devices = udev_enumerate_get_list_entry(enumerate); > >> + if (!devices) > >> + goto out; > >> + > >> + ret = udev_enumerate_add_match_subsystem(enumerate, "drm"); > >> + if (ret < 0) > >> + goto out; > >> + > >> + ret = udev_enumerate_scan_devices(enumerate); > >> + if (ret < 0) > >> + goto out; > >> + > >> + devices = udev_enumerate_get_list_entry(enumerate); > >> + if (!devices) > >> + goto out; > >> + > >> + igt_facts_list_mark(head); > >> + > >> + udev_list_entry_foreach(dev_list_entry, devices) { > >> + const char *path; > >> + struct udev_device *drm_dev, *pci_dev; > >> + const char *drm_name, *pci_addr; > >> + > >> + path = udev_list_entry_get_name(dev_list_entry); > >> + drm_dev = udev_device_new_from_syspath(udev, path); > >> + if (!drm_dev) > >> + continue; > >> + > >> + drm_name = udev_device_get_sysname(drm_dev); > >> + /* Filter the device by name. Want devices such as card0 and card1. > >> + * If the device has '-' in the name, contine > >> + */ > >> + if (strncmp(drm_name, "card", 4) != 0 || > >> + strchr(drm_name, '-') != NULL) { > >> + udev_device_unref(drm_dev); > >> + continue; > >> + } > >> + > >> + /* Get the pci address of the gpu associated with the drm_dev*/ > >> + pci_dev = udev_device_get_parent_with_subsystem_devtype(drm_dev, > >> + "pci", > >> + NULL); > >> + if (pci_dev) { > >> + pci_addr = udev_device_get_sysattr_value(pci_dev, > >> + "address"); > >> + if (!pci_addr) > >> + pci_addr = udev_device_get_sysname(pci_dev); > >> + } else { > >> + /* Some GPUs are platform devices. Ignore them. */ > >> + pci_addr = NULL; > >> + udev_device_unref(drm_dev); > >> + continue; > >> + } > >> + > >> + asprintf(&factname, "%s.%s", drm_card_fact, pci_addr); > >> + asprintf(&factvalue, "%s", drm_name); > >> + > >> + igt_facts_list_add(factname, factvalue, last_test, head); > >> + > >> + free(factname); > >> + free(factvalue); > >> + udev_device_unref(drm_dev); > >> + } > >> + > >> + igt_facts_list_sweep(head, last_test); > >> + > >> +out: > >> + udev_enumerate_unref(enumerate); > >> + udev_unref(udev); > >> +} > >> + > >> +/** > >> + * igt_facts_scan_kernel_taints: > >> + * @last_test: name of the last test > >> + * > >> + * This function scans for kernel taints using igt_kernel_tainted() and > >> + * igt_explain_taints(). It will cut off the explanation keeping only the > >> + * taint name. > >> + * > >> + * Returns: void > >> + */ > >> +static void igt_facts_scan_kernel_taints(const char *last_test) > >> +{ > >> + static struct igt_list_head *head = &igt_facts_list_ktaint_head; > >> + unsigned long taints = 0; > >> + const char *reason = NULL; > >> + char *taint_name = NULL; > >> + char *fact_name = NULL; > >> + > >> + taints = igt_kernel_tainted(&taints); > >> + /* For testing, set all bits to 1 > >> + * taints = 0xFFFFFFFF; > >> + */ > >> + > >> + > >> + igt_facts_list_mark(head); > >> + > >> + while ((reason = igt_explain_taints(&taints)) != NULL) { > >> + /* Cut at the ':' to get only the taint name */ > >> + taint_name = strtok(strdup(reason), ":"); > >> + if (!taint_name) > >> + continue; > >> + > >> + /* Lowercase taint_name */ > >> + for (int i = 0; taint_name[i]; i++) > >> + taint_name[i] = tolower(taint_name[i]); > >> + > >> + asprintf(&fact_name, "%s.%s", ktaint_fact, taint_name); > >> + igt_facts_list_add(fact_name, "true", last_test, head); > >> + > >> + free(taint_name); > >> + free(fact_name); > >> + } > >> + > >> + igt_facts_list_sweep(head, last_test); > >> +} > >> + > >> + > >> +/** > >> + * igt_facts_scan_kernel_loaded_kmods: > >> + * @last_test: name of the last test > >> + * > >> + * This function scans for loaded kmods using igt_fact_kmod_list and > >> + * igt_kmod_is_loaded(). > >> + * > >> + * Returns: void > >> + */ > >> +static void igt_facts_scan_kernel_loaded_kmods(const char *last_test) > >> +{ > >> + static struct igt_list_head *head = &igt_facts_list_kmod_head; > >> + char *name = NULL; > >> + > >> + igt_facts_list_mark(head); > >> + > >> + /* Iterate over igt_fact_kmod_list[] until the element contains "\0" */ > >> + for (int i = 0; strcmp(igt_fact_kmod_list[i], "\0") != 0; i++) { > >> + asprintf(&name, "%s.%s", kmod_fact, igt_fact_kmod_list[i]); > >> + if (igt_kmod_is_loaded(igt_fact_kmod_list[i])) > >> + igt_facts_list_add(name, "true", last_test, head); > >> + > >> + free(name); > >> + } > >> + > >> + igt_facts_list_sweep(head, last_test); > >> +} > >> + > >> +/** > >> + * igt_facts: > >> + * @last_test: name of the last test > >> + * > >> + * Call this function where you want to gather and report facts. > >> + * > >> + * Returns: void > >> + */ > >> +void igt_facts(const char *last_test) > >> +{ > >> + igt_facts_scan_pci_gpus(last_test); > >> + igt_facts_scan_pci_drm_cards(last_test); > >> + igt_facts_scan_kernel_taints(last_test); > >> + igt_facts_scan_kernel_loaded_kmods(last_test); > >> + > >> + fflush(stdout); > >> + fflush(stderr); > >> +} > >> + > >> +/* > >> + * Testing > >> + * > >> + * Defined here to keep most of the functions static > >> + * > >> + */ > >> + > >> +/** > >> + * igt_facts_test_add_get: > >> + * @head: head of the list > >> + * > >> + * Tests igt_facts_list_add and igt_facts_list_get. > >> + * > >> + * Returns: void > >> + */ > >> +static void igt_facts_test_add_get(struct igt_list_head *head) > >> +{ > >> + igt_fact *fact = NULL; > >> + bool ret; > >> + const char *name = "hardware.pci.gpu_at_addr.0000:00:02.0"; > >> + const char *value = "8086:64a0 Intel Lunarlake (Gen20)"; > >> + const char *last_test = NULL; > >> + > >> + ret = igt_facts_list_add(name, value, last_test, head); > >> + igt_assert(ret == true); > >> + > >> + // Assert that there is one element in the linked list > >> + igt_assert_eq(igt_list_length(head), 1); > >> + > >> + // Assert that the element in the linked list is the one we added > >> + fact = igt_facts_list_get(name, head); > >> + igt_assert(fact != NULL); > >> + igt_assert_eq(strcmp(fact->name, name), 0); > >> + igt_assert_eq(strcmp(fact->value, value), 0); > >> + igt_assert(fact->present == true); > >> + igt_assert(fact->last_test == NULL); > >> +} > >> + > >> +/** > >> + * igt_facts_test_mark_and_sweep: > >> + * @head: head of the list > >> + * > >> + * - Add 3 elements to the list and mark them as not present. > >> + * - Update two of the elements and mark them as present. > >> + * - Sweep the list and assert that > >> + * - Only the two updated elements are present > >> + * - The third element was deleted > >> + * > >> + * Returns: void > >> + */ > >> +static void igt_facts_test_mark_and_sweep(struct igt_list_head *head) > >> +{ > >> + igt_fact *fact = NULL; > >> + const char *name1 = "hardware.pci.gpu_at_addr.0000:00:02.0"; > >> + const char *value1 = "8086:64a0 Intel Lunarlake (Gen20)"; > >> + const char *name2 = "hardware.pci.gpu_at_addr.0000:00:03.0"; > >> + const char *value2 = "8086:64a1 Intel Lunarlake (Gen21)"; > >> + const char *name3 = "hardware.pci.gpu_at_addr.0000:00:04.0"; > >> + const char *value3 = "8086:64a2 Intel Lunarlake (Gen22)"; > >> + > >> + igt_facts_list_add(name1, value1, NULL, head); > >> + igt_facts_list_add(name2, value2, NULL, head); > >> + igt_facts_list_add(name3, value3, NULL, head); > >> + > >> + igt_facts_list_mark(head); > >> + > >> + igt_facts_list_add(name1, value1, NULL, head); > >> + igt_facts_list_add(name2, value2, NULL, head); > >> + > >> + igt_facts_list_sweep(head, NULL); > >> + > >> + // Assert that there are two elements in the linked list > >> + igt_assert_eq(igt_list_length(head), 2); > >> + > >> + // Assert that the two updated elements are present > >> + fact = igt_facts_list_get(name1, head); > >> + igt_assert(fact != NULL); > >> + igt_assert(fact->present == true); > >> + > >> + fact = igt_facts_list_get(name2, head); > >> + igt_assert(fact != NULL); > >> + igt_assert(fact->present == true); > >> + > >> + // Assert that the third element was deleted > >> + fact = igt_facts_list_get(name3, head); > >> + igt_assert(fact == NULL); > >> +} > >> + > >> +/** > >> + * igt_facts_test: > >> + * > >> + * Main function for testing the igt_facts module > >> + * > >> + * Returns: bool indicating if the tests passed > >> + */ > >> +void igt_facts_test(void) > >> +{ > >> + const char *last_test = "Unit Testing"; > >> + > >> + igt_facts_lists_init(); > >> + > >> + /* Assert that all lists are empty */ > >> + igt_assert(igt_list_empty(&igt_facts_list_kmod_head)); > >> + igt_assert(igt_list_empty(&igt_facts_list_ktaint_head)); > >> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head)); > >> + igt_assert(igt_list_empty(&igt_facts_list_drm_card_head)); > >> + > >> + /* Assert that add and get work. Will add one element to the list */ > >> + igt_facts_test_add_get(&igt_facts_list_pci_gpu_head); > >> + > >> + /* Assert that igt_facts_list_mark_and_sweep() cleans up the list */ > >> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == false); > >> + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head); > >> + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == true); > >> + > >> + /* Test the mark and sweep pattern used to delete elements > >> + * from the list > >> + */ > >> + igt_facts_test_mark_and_sweep(&igt_facts_list_pci_gpu_head); > >> + > >> + /* Clean up the list and call igt_facts(). This should not crash */ > >> + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head); > >> + igt_facts(last_test); > >> +} > >> diff --git a/lib/igt_facts.h b/lib/igt_facts.h > >> new file mode 100644 > >> index 000000000..e4adca3fb > >> --- /dev/null > >> +++ b/lib/igt_facts.h > >> @@ -0,0 +1,47 @@ > >> +/* SPDX-License-Identifier: MIT > >> + * Copyright © 2024 Intel Corporation > >> + */ > >> + > >> +#include <stdbool.h> > >> + > >> +#include "igt_list.h" > >> + > >> + > >> +/* igt_fact: > >> + * @name: name of the fact > >> + * @value: value of the fact > >> + * @last_test: name of the test that triggered the fact > >> + * @present: bool indicating if fact is present. Used for deleting facts from > >> + * the list. > >> + * @link: link to the next fact > >> + * > >> + * A fact is a piece of information that can be used to determine the state of > >> + * the system. > >> + * > >> + */ > >> +typedef struct { > >> + char *name; > >> + char *value; > >> + char *last_test; > >> + bool present; /* For mark and seep */ > >> + struct igt_list_head link; > >> +} igt_fact; > >> + > >> +const char *igt_fact_kmod_list[] = { > >> + "amdgpu", > >> + "i915", > >> + "nouveau", > >> + "radeon", > >> + "xe", > >> + "\0" > >> +}; > >> + > >> +const char *kmod_fact = "kernel.kmod_is_loaded"; /* true or false */ > >> +const char *ktaint_fact = "kernel.is_tainted"; /* taint name: taint_warn */ > >> +const char *pci_gpu_fact = "hardware.pci.gpu_at_addr"; /* id vendor model */ > >> +const char *drm_card_fact = "hardware.pci.drm_card_at_addr"; /* cardX */ > >> + > >> +void igt_facts_lists_init(void); > >> +void igt_facts(const char *last_test); > >> +bool igt_facts_are_all_lists_empty(void); > >> +void igt_facts_test(void); /* For unit testing only */ > >> diff --git a/lib/meson.build b/lib/meson.build > >> index c3556a921..c44ca2b5a 100644 > >> --- a/lib/meson.build > >> +++ b/lib/meson.build > >> @@ -18,6 +18,7 @@ lib_sources = [ > >> 'i915/i915_crc.c', > >> 'igt_collection.c', > >> 'igt_color_encoding.c', > >> + 'igt_facts.c', > >> 'igt_crc.c', > >> 'igt_debugfs.c', > >> 'igt_device.c', > >> diff --git a/lib/tests/igt_facts.c b/lib/tests/igt_facts.c > >> new file mode 100644 > >> index 000000000..7fa9d0f22 > >> --- /dev/null > >> +++ b/lib/tests/igt_facts.c > >> @@ -0,0 +1,15 @@ > >> +// SPDX-License-Identifier: MIT > >> +// Copyright © 2024 Intel Corporation > >> + > >> +#include <stdbool.h> > >> + > >> +#include "igt_core.h" > >> +#include "igt_facts.h" > >> + > >> +/* Tests are not defined here so we can keep most of the functions static */ > >> + > >> +igt_simple_main > >> +{ > >> + igt_info("Running igt_facts_test\n"); > >> + igt_facts_test(); > >> +} > >> diff --git a/lib/tests/meson.build b/lib/tests/meson.build > >> index df8092638..1ce19f63c 100644 > >> --- a/lib/tests/meson.build > >> +++ b/lib/tests/meson.build > >> @@ -8,6 +8,7 @@ lib_tests = [ > >> 'igt_dynamic_subtests', > >> 'igt_edid', > >> 'igt_exit_handler', > >> + 'igt_facts', > >> 'igt_fork', > >> 'igt_fork_helper', > >> 'igt_hook', > >> diff --git a/runner/executor.c b/runner/executor.c > >> index ac73e1dde..d1eca3c05 100644 > >> --- a/runner/executor.c > >> +++ b/runner/executor.c > >> @@ -30,6 +30,7 @@ > >> > >> #include "igt_aux.h" > >> #include "igt_core.h" > >> +#include "igt_facts.h" > >> #include "igt_taints.h" > >> #include "igt_vec.h" > >> #include "executor.h" > >> @@ -2306,6 +2307,9 @@ bool execute(struct execute_state *state, > >> sigset_t sigmask; > >> double time_spent = 0.0; > >> bool status = true; > >> + char *last_test = NULL; > >> + > >> + igt_facts_lists_init(); > >> > >> if (state->dry) { > >> outf("Dry run, not executing. Invoke igt_resume if you want to execute.\n"); > >> @@ -2438,6 +2442,10 @@ bool execute(struct execute_state *state, > >> int result; > >> bool already_written = false; > >> > >> + /* Calls before running each test */ > >> + igt_facts(last_test); > >> + last_test = entry_display_name(&job_list->entries[state->next]); > >> + > >> if (should_die_because_signal(sigfd)) { > >> status = false; > >> goto end; > >> @@ -2526,6 +2534,8 @@ bool execute(struct execute_state *state, > >> return execute(state, settings, job_list); > >> } > >> } > >> + /* Last call to collect facts after the last test runs */ > >> + igt_facts(last_test); > >> > >> if ((timefd = openat(resdirfd, "endtime.txt", O_CREAT | O_WRONLY | O_EXCL, 0666)) >= 0) { > >> dprintf(timefd, "%f\n", timeofday_double()); > >> diff --git a/tools/lsfacts.c b/tools/lsfacts.c > >> new file mode 100644 > >> index 000000000..10dee0317 > >> --- /dev/null > >> +++ b/tools/lsfacts.c > >> @@ -0,0 +1,25 @@ > >> +// SPDX-License-Identifier: MIT > >> +// Copyright © 2024 Intel Corporation > >> + > >> +#include "igt.h" > >> +#include "igt_facts.h" > >> + > >> +/** > >> + * SECTION:lsfacts > >> + * @short_description: lsfacts > >> + * @title: lsfacts > >> + * @include: lsfacts.c > >> + * > >> + * # lsfacts > >> + * > >> + * Scan for igt-facts and print them on screen. Indicate if no facts are found. > >> + */ > >> +int main(int argc, char *argv[]) > >> +{ > >> + igt_facts_lists_init(); > >> + > >> + igt_facts("lsfacts"); > >> + > >> + if (igt_facts_are_all_lists_empty()) > >> + igt_info("No facts found...\n"); > >> +} > >> diff --git a/tools/meson.build b/tools/meson.build > >> index 48c9a4b50..ff1b0ef90 100644 > >> --- a/tools/meson.build > >> +++ b/tools/meson.build > >> @@ -42,6 +42,7 @@ tools_progs = [ > >> 'intel_gem_info', > >> 'intel_gvtg_test', > >> 'dpcd_reg', > >> + 'lsfacts', > >> 'lsgpu', > >> 'power', > >> ] > >> > > > > > > > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking 2024-12-09 9:17 ` Janusz Krzysztofik @ 2024-12-09 11:06 ` Peter Senna Tschudin 2024-12-09 13:50 ` Janusz Krzysztofik 0 siblings, 1 reply; 13+ messages in thread From: Peter Senna Tschudin @ 2024-12-09 11:06 UTC (permalink / raw) To: Janusz Krzysztofik, igt-dev@lists.freedesktop.org Cc: Ryszard Knop, Zbigniew Kempczyński, Lucas De Marchi, luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray, dominik.karol.piatkowski, katarzyna.piecielska Hi Janusz, On 09.12.2024 10:17, Janusz Krzysztofik wrote: > On Friday, 6 December 2024 06:45:31 CET Peter Senna Tschudin wrote: >> Hi Janusz, >> >> Thank you for your detailed comments. I appreciate the opportunity >> to clarify and address your concerns. >> >> On 05.12.2024 15:05, Janusz Krzysztofik wrote: >>> Hi Peter, >>> >>> On Wednesday, 4 December 2024 19:44:53 CET Peter Senna Tschudin wrote: >>>> When using igt-runner, collect facts before each test and after the >>>> last test, and report when facts change. The facts are: >>>> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20) >>>> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1 >>>> - Kernel taints: kernel.is_tainted.taint_warn: true >>>> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true >>>> >>>> This change imposes little execution overhead and adds just a few >>>> lines of logging. The facts will be printed on normal igt-runner >>>> output. Here is a real example from our CI shwoing >>>> hotreplug-lateclose changing the DRM card number >>> >>> Since you give that as an example of how helpful your facts can be, and follow >>> that with a kernel taint example, that may indicate you think, and users of >>> your facts may then be mislead having that read, that the taint was related to >>> the change of card number, while both had nothing to do with each other. >> >> Let’s take a step back to define the purpose and scope of igt-facts: >> - Definition of a fact from the dictionary: A fact is an objectively verifiable >> piece of information. >> - Purpose of igt-facts: Track which tests cause changes to the facts. >> >> The operation is straightforward: facts are collected before and after each test, >> and any differences are logged. Here’s an example showing a fact change and a new >> fact after running hotreplug-lateclose: >> >> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1 >> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true >> >> This output highlights the facts without implying causation between them. The >> tool(and my commit message) neither explains relationships between facts nor >> misleads users into assuming causation. > > For me your commit message does. Can you please provide some external support for your claims? I am very careful while writing, and I am aware of my own limitations. After reading what chatGPT wrote about my text, I felt comfortable about not misleading anyone. Please mind the wording of my question: I did not hint a desired outcome for chatGPT. My Question: -- // -- Can you please indicate any potential causality relationship that the following text may offer? Here is a real example from our CI shwoing hotreplug-lateclose changing the DRM card number and tainting the kernel on the abort path -- // -- chatGPT answer: --//-- The text suggests a potential causality relationship between the "hotreplug-lateclose" operation and its effects on the system: 1. Causal Event: The "hotreplug-lateclose" operation. This likely refers to a process in the CI(Continuous Integration) environment related to hotplugging devices and managing late-close scenarios. Hotplugging involves dynamically adding or removing hardware components without shutting down the system. 2. Effects: * Changing the DRM card number: This implies that the operation affects the enumeration or reordering of DRM (Direct Rendering Manager) devices, causing the card number to be reassigned. * Kernel Tainting on the abort path: Kernel tainting typically indicates that the kernel has entered a state that is not guaranteed to be stable or reliable. This could occur due to improper handling of errors or unexpected conditions during the "abort path" (likely a failure or cleanup process in the operation). Suggested Causal Chain: 1. The "hotreplug-lateclose" operation executes in the CI. 2. This causes a change in DRM card enumeration (potentially due to how devices are initialized or re-initialized). 3. On the abort path of this operation, a condition arises that taints the kernel (possibly due to resource cleanup issues, unhandled errors, or driver misbehavior). This text highlights a possible link between hotreplug-lateclose handling and instabilities in DRM device management as well as kernel state integrity. --//-- > > Can you please provide a full list of "facts" your code is supposed to handle? This is in the commit message already, at the very begining. > Can you please explain why you selected just those "facts", not others? It was either what was missing, such as a convenient way to learn when something strange happend as a gpu disappearing from the PCI bus, or something that we believe may cause errors downstream such as a taint, and the list of loaded modules. For the drm card number association, we belive that there may be a caching issue: we are trying to figure it out if the drm-reopen cache handles the change of drm number association well. We have weak information pointing to a probable problem akin to missing cache invalidation. Thanks! [...] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking 2024-12-09 11:06 ` Peter Senna Tschudin @ 2024-12-09 13:50 ` Janusz Krzysztofik 2024-12-10 8:38 ` Peter Senna Tschudin 0 siblings, 1 reply; 13+ messages in thread From: Janusz Krzysztofik @ 2024-12-09 13:50 UTC (permalink / raw) To: igt-dev@lists.freedesktop.org, Peter Senna Tschudin Cc: Ryszard Knop, Zbigniew Kempczyński, Lucas De Marchi, luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray, dominik.karol.piatkowski, katarzyna.piecielska On Monday, 9 December 2024 12:06:33 CET Peter Senna Tschudin wrote: > Hi Janusz, > > On 09.12.2024 10:17, Janusz Krzysztofik wrote: > > On Friday, 6 December 2024 06:45:31 CET Peter Senna Tschudin wrote: > >> Hi Janusz, > >> > >> Thank you for your detailed comments. I appreciate the opportunity > >> to clarify and address your concerns. > >> > >> On 05.12.2024 15:05, Janusz Krzysztofik wrote: > >>> Hi Peter, > >>> > >>> On Wednesday, 4 December 2024 19:44:53 CET Peter Senna Tschudin wrote: > >>>> When using igt-runner, collect facts before each test and after the > >>>> last test, and report when facts change. The facts are: > >>>> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20) > >>>> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1 > >>>> - Kernel taints: kernel.is_tainted.taint_warn: true > >>>> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true > >>>> > >>>> This change imposes little execution overhead and adds just a few > >>>> lines of logging. The facts will be printed on normal igt-runner > >>>> output. Here is a real example from our CI shwoing > >>>> hotreplug-lateclose changing the DRM card number > >>> > >>> Since you give that as an example of how helpful your facts can be, and follow > >>> that with a kernel taint example, that may indicate you think, and users of > >>> your facts may then be mislead having that read, that the taint was related to > >>> the change of card number, while both had nothing to do with each other. > >> > >> Let’s take a step back to define the purpose and scope of igt-facts: > >> - Definition of a fact from the dictionary: A fact is an objectively verifiable > >> piece of information. > >> - Purpose of igt-facts: Track which tests cause changes to the facts. > >> > >> The operation is straightforward: facts are collected before and after each test, > >> and any differences are logged. Here’s an example showing a fact change and a new > >> fact after running hotreplug-lateclose: > >> > >> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1 > >> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true > >> > >> This output highlights the facts without implying causation between them. The > >> tool(and my commit message) neither explains relationships between facts nor > >> misleads users into assuming causation. > > > > For me your commit message does. > > Can you please provide some external support for your claims? > > I am very careful while writing, and I am aware of my own limitations. > After reading what chatGPT wrote about my text, I felt comfortable > about not misleading anyone. Please mind the wording of my > question: I did not hint a desired outcome for chatGPT. > > My Question: > -- // -- > Can you please indicate any potential causality relationship > that the following text may offer? > > Here is a real example from our CI shwoing hotreplug-lateclose > changing the DRM card number and tainting the kernel on the > abort path > -- // -- > > chatGPT answer: > --//-- > The text suggests a potential causality relationship between the > "hotreplug-lateclose" operation and its effects on the system: > > 1. Causal Event: The "hotreplug-lateclose" operation. > This likely refers to a process in the CI(Continuous Integration) > environment related to hotplugging devices and managing late-close > scenarios. Hotplugging involves dynamically adding or removing > hardware components without shutting down the system. > > 2. Effects: > > * Changing the DRM card number: This implies that the operation > affects the enumeration or reordering of DRM (Direct Rendering > Manager) devices, causing the card number to be reassigned. > > * Kernel Tainting on the abort path: Kernel tainting typically > indicates that the kernel has entered a state that is not > guaranteed to be stable or reliable. This could occur due > to improper handling of errors or unexpected conditions > during the "abort path" (likely a failure or cleanup process > in the operation). > > Suggested Causal Chain: > 1. The "hotreplug-lateclose" operation executes in the CI. > 2. This causes a change in DRM card enumeration (potentially due > to how devices are initialized or re-initialized). > 3. On the abort path of this operation, a condition arises that > taints the kernel (possibly due to resource cleanup issues, > unhandled errors, or driver misbehavior). > > This text highlights a possible link between hotreplug-lateclose > handling and instabilities in DRM device management as well as > kernel state integrity. > --//-- > > > > > > > Can you please provide a full list of "facts" your code is supposed to handle? > > This is in the commit message already, at the very begining. > > > Can you please explain why you selected just those "facts", not others? > > It was either what was missing, such as a convenient way to learn when > something strange happend as a gpu disappearing from the PCI bus, or > something that we believe may cause errors downstream such as a taint, > and the list of loaded modules. > > For the drm card number association, we belive that there may be a caching > issue: we are trying to figure it out if the drm-reopen cache handles the > change of drm number association well. We have weak information pointing > to a probable problem akin to missing cache invalidation. Let's have a look at CI reports from igt@core_hotunplug@hotre* subtests that resulted in kernel taint: https://gfx-ci.igk.intel.com/cibuglog-ng/results/all?query=test_name%20~=%20%22^igt@core_hotunplug@hotre%22%20AND%20runconfig_tag%20CONTAINS%20%22xe%22%20AND%20status_name%20NOT%20IN%20[%22notrun%22,%22pass%22,%22skip%22]%20AND%20dmesg%20~=%20%22[tT]aint%22 Here are three cases from the top of the list at the time of this writing: 1. https://intel-gfx-ci.01.org/tree/intel-xe/IGT_8142/shard-bmg-5/igt@core_hotunplug@hotrebind-lateclose.html stdout: ... Opened device: /dev/dri/card0 ... Opened device: /dev/dri/renderD128 ... Opened device: /dev/dri/card1 Opened device: /dev/dri/renderD129 dmesg: ... <6> [339.587085] [IGT] core_hotunplug: starting subtest hotrebind-lateclose ... <4> [340.531229] WARNING: CPU: 8 PID: 45 at drivers/gpu/drm/i915/display/intel_display_power.c:580 __intel_display_power_put_domain+0x1c9/0x200 [xe] ... <4> [340.532206] CPU: 8 UID: 0 PID: 45 Comm: kworker/8:0 Tainted: G W 6.13.0-rc1-xe+ #1 <4> [340.532221] Tainted: [W]=WARN ... <7> [340.617993] [IGT] core_hotunplug: opening DRM device for hot rebind ... <4> [340.929113] WARNING: CPU: 2 PID: 4067 at drivers/gpu/drm/i915/display/intel_display_power.c:580 __intel_display_power_put_domain+0x1c9/0x200 [xe] ... <4> [340.930040] CPU: 2 UID: 0 PID: 4067 Comm: core_hotunplug Tainted: G W 6.13.0-rc1-xe+ #1 <4> [340.930054] Tainted: [W]=WARN ... <4> [341.059764] WARNING: CPU: 2 PID: 4067 at drivers/gpu/drm/i915/display/intel_display_power.c:580 __intel_display_power_put_domain+0x1c9/0x200 [xe] ... <4> [341.059995] CPU: 2 UID: 0 PID: 4067 Comm: core_hotunplug Tainted: G W 6.13.0-rc1-xe+ #1 <4> [341.059999] Tainted: [W]=WARN ... <7> [341.348551] [IGT] core_hotunplug: rebinding the driver to the device ... <6> [341.614815] [drm] Initialized xe 1.1.0 for 0000:03:00.0 on minor 1 ... 2. https://intel-gfx-ci.01.org/tree/intel-xe/xe-2331-e57b4b7cd137e0ae00d2601b4b55ab7f504da422/shard-bmg-8/igt@core_hotunplug@hotrebind-lateclose.html stdout: ... Opened device: /dev/dri/card1 ... Opened device: /dev/dri/renderD129 ... Opened device: /dev/dri/card0 Opened device: /dev/dri/renderD128 ... dmesg: ... <7> [483.134426] [IGT] core_hotunplug: opening DRM device for hot rebind ... <4> [483.441514] WARNING: CPU: 8 PID: 5465 at drivers/gpu/drm/i915/display/intel_display_power.c:580 __intel_display_power_put_domain+0x1c9/0x200 [xe] ... <4> [483.442441] CPU: 8 UID: 0 PID: 5465 Comm: core_hotunplug Tainted: G W 6.13.0-rc1-xe+ #1 <4> [483.442456] Tainted: [W]=WARN ... <4> [483.586987] WARNING: CPU: 0 PID: 5465 at drivers/gpu/drm/i915/display/intel_display_power.c:580 __intel_display_power_put_domain+0x1c9/0x200 [xe] ... <4> [483.587210] CPU: 0 UID: 0 PID: 5465 Comm: core_hotunplug Tainted: G W 6.13.0-rc1-xe+ #1 <4> [483.587214] Tainted: [W]=WARN ... <7> [484.026293] [IGT] core_hotunplug: rebinding the driver to the device ... <6> [484.291135] [drm] Initialized xe 1.1.0 for 0000:03:00.0 on minor 0 ... 3. https://intel-gfx-ci.01.org/tree/intel-xe/xe-2330-784dd0b20e39add60971ccdd5d2f7f3c27cf37bb/shard-bmg-4/igt@core_hotunplug@hotrebind-lateclose.html stdout: ... Opened device: /dev/dri/card1 ... Opened device: /dev/dri/renderD129 ... Opened device: /dev/dri/card0 Opened device: /dev/dri/renderD128 ... dmesg: ... <6> [483.536630] [IGT] core_hotunplug: starting subtest hotrebind-lateclose ... <4> [484.402689] WARNING: CPU: 4 PID: 4167 at drivers/gpu/drm/i915/display/intel_display_power.c:580 __intel_display_power_put_domain+0x1c9/0x200 [xe] ... <4> [484.403663] CPU: 4 UID: 0 PID: 4167 Comm: kworker/4:7 Tainted: G W 6.13.0-rc1-xe+ #1 <4> [484.403677] Tainted: [W]=WARN ... <7> [484.563261] [IGT] core_hotunplug: closing health checked device instance ... <4> [484.641694] WARNING: CPU: 9 PID: 4887 at drivers/gpu/drm/i915/display/intel_display.c:416 intel_disable_transcoder+0x322/0x400 [xe] ... <4> [484.642621] CPU: 9 UID: 0 PID: 4887 Comm: kworker/9:4 Tainted: G W 6.13.0-rc1-xe+ #1 <4> [484.642635] Tainted: [W]=WARN ... <7> [484.725550] [IGT] core_hotunplug: closing health checked device instance ... <4> [484.873571] WARNING: CPU: 9 PID: 4887 at drivers/gpu/drm/i915/display/intel_display.c:416 intel_disable_transcoder+0x322/0x400 [xe] ... <4> [484.874656] CPU: 9 UID: 0 PID: 4887 Comm: kworker/9:4 Tainted: G W 6.13.0-rc1-xe+ #1 <4> [484.874672] Tainted: [W]=WARN ... <7> [484.959255] [IGT] core_hotunplug: opening DRM device for hot rebind ... <4> [485.126668] WARNING: CPU: 9 PID: 4887 at drivers/gpu/drm/i915/display/intel_display.c:416 intel_disable_transcoder+0x322/0x400 [xe] ... <4> [485.127401] CPU: 9 UID: 0 PID: 4887 Comm: kworker/9:4 Tainted: G W 6.13.0-rc1-xe+ #1 <4> [485.127413] Tainted: [W]=WARN ... <4> [485.309106] WARNING: CPU: 0 PID: 5276 at drivers/gpu/drm/i915/display/intel_display_power.c:580 __intel_display_power_put_domain+0x1c9/0x200 [xe] ... <4> [485.309290] CPU: 0 UID: 0 PID: 5276 Comm: core_hotunplug Tainted: G W 6.13.0-rc1-xe+ #1 <4> [485.309293] Tainted: [W]=WARN ... ... <7> [485.941797] [IGT] core_hotunplug: rebinding the driver to the device ... <6> [486.209963] [drm] Initialized xe 1.1.0 for 0000:03:00.0 on minor 0 ... In all the above reports I can see information on changed device minor number and kernel tainted. Now let's try to find an abort with no relevant information in dmesg (i.e., neither explicit info about kernel taint, nor BUG nor WARNING kernel messages -- events that mark the kernel tainted): https://gfx-ci.igk.intel.com/cibuglog-ng/results/all?query=test_name+~%3D+%22^igt%40core_hotunplug%40hotre%22+AND+status_name+IS+IN+[%22abort%22]+AND+NOT+dmesg+~%3D+%22[tT]aint|BUG|WARNING%22 I can't find any, can you? Let's also try to find reports from executions where the driver was succesfully rebound but information about potentially changed device minor number was not provided: https://gfx-ci.igk.intel.com/cibuglog-ng/results/all?query=test_name%20~=%20%22^igt@core_hotunplug@hotre%22%20AND%20status_name%20NOT%20IN%20[%22notrun%22,%22pass%22,%22skip%22]%20AND%20dmesg%20~=%20%22opening%20DRM%20device%20for%20hot(.*\n)*.*\[drm\]%20Initialized(.*\n)*.*opening%20DRM%20device%20for%20health%20check%22%20AND%20NOT%20stdout%20~=%20%22\nOpened%20device:%20/dev/dri/renderD12[0-9]\n(.*\n)*Opened%20device:%20/dev/dri/card[0-9]\n%22 I can see only cases where IGT runner didn't include such information from stdout in the report because it killed the test after timeout and stopped tracing its stdout before the test attempted to rebind the driver and had a chance to print those messages to stdout. *What value do your facts add to CI reporting then?* Thanks, Janusz > > > Thanks! > > [...] > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking 2024-12-09 13:50 ` Janusz Krzysztofik @ 2024-12-10 8:38 ` Peter Senna Tschudin 2024-12-10 9:50 ` Janusz Krzysztofik 0 siblings, 1 reply; 13+ messages in thread From: Peter Senna Tschudin @ 2024-12-10 8:38 UTC (permalink / raw) To: Janusz Krzysztofik, igt-dev@lists.freedesktop.org Cc: Ryszard Knop, Zbigniew Kempczyński, Lucas De Marchi, luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray, dominik.karol.piatkowski, katarzyna.piecielska Very good morning Janusz, Thank you for your detailed question. On 09.12.2024 14:50, Janusz Krzysztofik wrote: > On Monday, 9 December 2024 12:06:33 CET Peter Senna Tschudin wrote: >> Hi Janusz, >> >> On 09.12.2024 10:17, Janusz Krzysztofik wrote: >>> On Friday, 6 December 2024 06:45:31 CET Peter Senna Tschudin wrote: >>>> Hi Janusz, >>>> >>>> Thank you for your detailed comments. I appreciate the opportunity >>>> to clarify and address your concerns. >>>> >>>> On 05.12.2024 15:05, Janusz Krzysztofik wrote: >>>>> Hi Peter, >>>>> >>>>> On Wednesday, 4 December 2024 19:44:53 CET Peter Senna Tschudin wrote: >>>>>> When using igt-runner, collect facts before each test and after the >>>>>> last test, and report when facts change. The facts are: >>>>>> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20) >>>>>> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1 >>>>>> - Kernel taints: kernel.is_tainted.taint_warn: true >>>>>> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true >>>>>> >>>>>> This change imposes little execution overhead and adds just a few >>>>>> lines of logging. The facts will be printed on normal igt-runner >>>>>> output. Here is a real example from our CI shwoing >>>>>> hotreplug-lateclose changing the DRM card number >>>>> >>>>> Since you give that as an example of how helpful your facts can be, and follow >>>>> that with a kernel taint example, that may indicate you think, and users of >>>>> your facts may then be mislead having that read, that the taint was related to >>>>> the change of card number, while both had nothing to do with each other. >>>> >>>> Let’s take a step back to define the purpose and scope of igt-facts: >>>> - Definition of a fact from the dictionary: A fact is an objectively verifiable >>>> piece of information. >>>> - Purpose of igt-facts: Track which tests cause changes to the facts. >>>> >>>> The operation is straightforward: facts are collected before and after each test, >>>> and any differences are logged. Here’s an example showing a fact change and a new >>>> fact after running hotreplug-lateclose: >>>> >>>> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1 >>>> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true >>>> >>>> This output highlights the facts without implying causation between them. The >>>> tool(and my commit message) neither explains relationships between facts nor >>>> misleads users into assuming causation. >>> >>> For me your commit message does. >> >> Can you please provide some external support for your claims? >> >> I am very careful while writing, and I am aware of my own limitations. >> After reading what chatGPT wrote about my text, I felt comfortable >> about not misleading anyone. Please mind the wording of my >> question: I did not hint a desired outcome for chatGPT. >> >> My Question: >> -- // -- >> Can you please indicate any potential causality relationship >> that the following text may offer? >> >> Here is a real example from our CI shwoing hotreplug-lateclose >> changing the DRM card number and tainting the kernel on the >> abort path >> -- // -- >> >> chatGPT answer: >> --//-- >> The text suggests a potential causality relationship between the >> "hotreplug-lateclose" operation and its effects on the system: >> >> 1. Causal Event: The "hotreplug-lateclose" operation. >> This likely refers to a process in the CI(Continuous Integration) >> environment related to hotplugging devices and managing late-close >> scenarios. Hotplugging involves dynamically adding or removing >> hardware components without shutting down the system. >> >> 2. Effects: >> >> * Changing the DRM card number: This implies that the operation >> affects the enumeration or reordering of DRM (Direct Rendering >> Manager) devices, causing the card number to be reassigned. >> >> * Kernel Tainting on the abort path: Kernel tainting typically >> indicates that the kernel has entered a state that is not >> guaranteed to be stable or reliable. This could occur due >> to improper handling of errors or unexpected conditions >> during the "abort path" (likely a failure or cleanup process >> in the operation). >> >> Suggested Causal Chain: >> 1. The "hotreplug-lateclose" operation executes in the CI. >> 2. This causes a change in DRM card enumeration (potentially due >> to how devices are initialized or re-initialized). >> 3. On the abort path of this operation, a condition arises that >> taints the kernel (possibly due to resource cleanup issues, >> unhandled errors, or driver misbehavior). >> >> This text highlights a possible link between hotreplug-lateclose >> handling and instabilities in DRM device management as well as >> kernel state integrity. >> --//-- >> >> >> >>> >>> Can you please provide a full list of "facts" your code is supposed to handle? >> >> This is in the commit message already, at the very begining. >> >>> Can you please explain why you selected just those "facts", not others? >> >> It was either what was missing, such as a convenient way to learn when >> something strange happend as a gpu disappearing from the PCI bus, or >> something that we believe may cause errors downstream such as a taint, >> and the list of loaded modules. >> >> For the drm card number association, we belive that there may be a caching >> issue: we are trying to figure it out if the drm-reopen cache handles the >> change of drm number association well. We have weak information pointing >> to a probable problem akin to missing cache invalidation. [...] > *What value do your facts add to CI reporting then?* It seems that the focus of your feedback has shifted from 'Peter is misleading people' to questioning the value of igt-runner. I believe it's important to address this change in perspective to ensure clarity and productive discussion. To address your new point, let me draw an analogy: asking "what is the value of igt-runner?" is similar to questioning the use of a car to reach the supermarket when walking is an option. While walking may suffice, the car adds efficiency and convenience, especially for larger or more complex "tasks". In our CI setup, a typical test list includes between 100 and 200 tests. As you rightly pointed out, determining which test changed a specific fact currently requires significant time and effort. When multiple tests modify different facts, the process is not practical. igt-facts adds value by: - Clearly identifying which tests change which facts. - Enabling us to know which facts were present and absent before the first test runs. - Reporting only changes, making the information concise and efficient to review. Moreover, for most developers, the igt-facts output will not be obtrusive. In our CI runs, the output is saved only to the igt_runner?? file and does not appear in dmesg or the results.json file. And even so, in the few hundred runs that I checked, igt-facts typically added between 3 and 16 lines of log to a file that has about 1000 lines. I hope this explanation helps clarify the purpose and value of igt-facts. Thank you, Peter ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking 2024-12-10 8:38 ` Peter Senna Tschudin @ 2024-12-10 9:50 ` Janusz Krzysztofik 2024-12-10 13:41 ` Knop, Ryszard 0 siblings, 1 reply; 13+ messages in thread From: Janusz Krzysztofik @ 2024-12-10 9:50 UTC (permalink / raw) To: igt-dev@lists.freedesktop.org, Peter Senna Tschudin Cc: Ryszard Knop, Zbigniew Kempczyński, Lucas De Marchi, luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray, dominik.karol.piatkowski, katarzyna.piecielska Hi Peter, On Tuesday, 10 December 2024 09:38:59 CET Peter Senna Tschudin wrote: > Very good morning Janusz, > > Thank you for your detailed question. > > On 09.12.2024 14:50, Janusz Krzysztofik wrote: > > On Monday, 9 December 2024 12:06:33 CET Peter Senna Tschudin wrote: > >> Hi Janusz, > >> > >> On 09.12.2024 10:17, Janusz Krzysztofik wrote: > >>> On Friday, 6 December 2024 06:45:31 CET Peter Senna Tschudin wrote: > >>>> Hi Janusz, > >>>> > >>>> Thank you for your detailed comments. I appreciate the opportunity > >>>> to clarify and address your concerns. > >>>> > >>>> On 05.12.2024 15:05, Janusz Krzysztofik wrote: > >>>>> Hi Peter, > >>>>> > >>>>> On Wednesday, 4 December 2024 19:44:53 CET Peter Senna Tschudin wrote: > >>>>>> When using igt-runner, collect facts before each test and after the > >>>>>> last test, and report when facts change. The facts are: > >>>>>> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20) > >>>>>> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1 > >>>>>> - Kernel taints: kernel.is_tainted.taint_warn: true > >>>>>> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true > >>>>>> > >>>>>> This change imposes little execution overhead and adds just a few > >>>>>> lines of logging. The facts will be printed on normal igt-runner > >>>>>> output. Here is a real example from our CI shwoing > >>>>>> hotreplug-lateclose changing the DRM card number > >>>>> > >>>>> Since you give that as an example of how helpful your facts can be, and follow > >>>>> that with a kernel taint example, that may indicate you think, and users of > >>>>> your facts may then be mislead having that read, that the taint was related to > >>>>> the change of card number, while both had nothing to do with each other. > >>>> > >>>> Let’s take a step back to define the purpose and scope of igt-facts: > >>>> - Definition of a fact from the dictionary: A fact is an objectively verifiable > >>>> piece of information. > >>>> - Purpose of igt-facts: Track which tests cause changes to the facts. > >>>> > >>>> The operation is straightforward: facts are collected before and after each test, > >>>> and any differences are logged. Here’s an example showing a fact change and a new > >>>> fact after running hotreplug-lateclose: > >>>> > >>>> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1 > >>>> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true > >>>> > >>>> This output highlights the facts without implying causation between them. The > >>>> tool(and my commit message) neither explains relationships between facts nor > >>>> misleads users into assuming causation. > >>> > >>> For me your commit message does. > >> > >> Can you please provide some external support for your claims? > >> > >> I am very careful while writing, and I am aware of my own limitations. > >> After reading what chatGPT wrote about my text, I felt comfortable > >> about not misleading anyone. Please mind the wording of my > >> question: I did not hint a desired outcome for chatGPT. > >> > >> My Question: > >> -- // -- > >> Can you please indicate any potential causality relationship > >> that the following text may offer? > >> > >> Here is a real example from our CI shwoing hotreplug-lateclose > >> changing the DRM card number and tainting the kernel on the > >> abort path > >> -- // -- > >> > >> chatGPT answer: > >> --//-- > >> The text suggests a potential causality relationship between the > >> "hotreplug-lateclose" operation and its effects on the system: > >> > >> 1. Causal Event: The "hotreplug-lateclose" operation. > >> This likely refers to a process in the CI(Continuous Integration) > >> environment related to hotplugging devices and managing late-close > >> scenarios. Hotplugging involves dynamically adding or removing > >> hardware components without shutting down the system. > >> > >> 2. Effects: > >> > >> * Changing the DRM card number: This implies that the operation > >> affects the enumeration or reordering of DRM (Direct Rendering > >> Manager) devices, causing the card number to be reassigned. > >> > >> * Kernel Tainting on the abort path: Kernel tainting typically > >> indicates that the kernel has entered a state that is not > >> guaranteed to be stable or reliable. This could occur due > >> to improper handling of errors or unexpected conditions > >> during the "abort path" (likely a failure or cleanup process > >> in the operation). > >> > >> Suggested Causal Chain: > >> 1. The "hotreplug-lateclose" operation executes in the CI. > >> 2. This causes a change in DRM card enumeration (potentially due > >> to how devices are initialized or re-initialized). > >> 3. On the abort path of this operation, a condition arises that > >> taints the kernel (possibly due to resource cleanup issues, > >> unhandled errors, or driver misbehavior). > >> > >> This text highlights a possible link between hotreplug-lateclose > >> handling and instabilities in DRM device management as well as > >> kernel state integrity. > >> --//-- > >> > >> > >> > >>> > >>> Can you please provide a full list of "facts" your code is supposed to handle? > >> > >> This is in the commit message already, at the very begining. > >> > >>> Can you please explain why you selected just those "facts", not others? > >> > >> It was either what was missing, such as a convenient way to learn when > >> something strange happend as a gpu disappearing from the PCI bus, or > >> something that we believe may cause errors downstream such as a taint, > >> and the list of loaded modules. > >> > >> For the drm card number association, we belive that there may be a caching > >> issue: we are trying to figure it out if the drm-reopen cache handles the > >> change of drm number association well. We have weak information pointing > >> to a probable problem akin to missing cache invalidation. > > [...] > > *What value do your facts add to CI reporting then?* > > It seems that the focus of your feedback has shifted from 'Peter is > misleading people' to questioning the value of igt-runner. I believe > it's important to address this change in perspective to ensure > clarity and productive discussion. > > To address your new point, let me draw an analogy: asking "what is > the value of igt-runner?" is similar to questioning the use of a car > to reach the supermarket when walking is an option. While walking may > suffice, the car adds efficiency and convenience, especially for > larger or more complex "tasks". > > In our CI setup, a typical test list includes between 100 and > 200 tests. As you rightly pointed out, determining which test changed > a specific fact currently requires significant time and effort. When > multiple tests modify different facts, the process is not practical. > > igt-facts adds value by: > - Clearly identifying which tests change which facts. > - Enabling us to know which facts were present and absent before the > first test runs. > - Reporting only changes, making the information concise and > efficient to review. > > Moreover, for most developers, the igt-facts output will not be > obtrusive. In our CI runs, the output is saved only to the > igt_runner?? file and does not appear in dmesg or the results.json > file. > > And even so, in the few hundred runs that I checked, igt-facts > typically added between 3 and 16 lines of log to a file that has > about 1000 lines. > > I hope this explanation helps clarify the purpose and value of > igt-facts. Unfortunately not. I still can't see how your 4 facts reported by igt_runner could be useful to me. But let's hear from those who share your points. Thanks, Janusz > > Thank you, > > Peter > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking 2024-12-10 9:50 ` Janusz Krzysztofik @ 2024-12-10 13:41 ` Knop, Ryszard 0 siblings, 0 replies; 13+ messages in thread From: Knop, Ryszard @ 2024-12-10 13:41 UTC (permalink / raw) To: igt-dev@lists.freedesktop.org, peter.senna@linux.intel.com, janusz.krzysztofik@linux.intel.com Cc: Das, Nirmoy, Coelho, Luciano, Piecielska, Katarzyna, Kempczynski, Zbigniew, Piatkowski, Dominik Karol, Summers, Stuart, Ghimiray, Himal Prasad, De Marchi, Lucas On Tue, 2024-12-10 at 10:50 +0100, Janusz Krzysztofik wrote: > Hi Peter, > > On Tuesday, 10 December 2024 09:38:59 CET Peter Senna Tschudin wrote: > > Very good morning Janusz, > > > > Thank you for your detailed question. > > > > On 09.12.2024 14:50, Janusz Krzysztofik wrote: > > > On Monday, 9 December 2024 12:06:33 CET Peter Senna Tschudin wrote: > > > > Hi Janusz, > > > > > > > > On 09.12.2024 10:17, Janusz Krzysztofik wrote: > > > > > On Friday, 6 December 2024 06:45:31 CET Peter Senna Tschudin wrote: > > > > > > Hi Janusz, > > > > > > > > > > > > Thank you for your detailed comments. I appreciate the opportunity > > > > > > to clarify and address your concerns. > > > > > > > > > > > > On 05.12.2024 15:05, Janusz Krzysztofik wrote: > > > > > > > Hi Peter, > > > > > > > > > > > > > > On Wednesday, 4 December 2024 19:44:53 CET Peter Senna Tschudin wrote: > > > > > > > > When using igt-runner, collect facts before each test and after the > > > > > > > > last test, and report when facts change. The facts are: > > > > > > > > - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20) > > > > > > > > - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1 > > > > > > > > - Kernel taints: kernel.is_tainted.taint_warn: true > > > > > > > > - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true > > > > > > > > > > > > > > > > This change imposes little execution overhead and adds just a few > > > > > > > > lines of logging. The facts will be printed on normal igt-runner > > > > > > > > output. Here is a real example from our CI shwoing > > > > > > > > hotreplug-lateclose changing the DRM card number > > > > > > > > > > > > > > Since you give that as an example of how helpful your facts can be, and follow > > > > > > > that with a kernel taint example, that may indicate you think, and users of > > > > > > > your facts may then be mislead having that read, that the taint was related to > > > > > > > the change of card number, while both had nothing to do with each other. > > > > > > > > > > > > Let’s take a step back to define the purpose and scope of igt-facts: > > > > > > - Definition of a fact from the dictionary: A fact is an objectively verifiable > > > > > > piece of information. > > > > > > - Purpose of igt-facts: Track which tests cause changes to the facts. > > > > > > > > > > > > The operation is straightforward: facts are collected before and after each test, > > > > > > and any differences are logged. Here’s an example showing a fact change and a new > > > > > > fact after running hotreplug-lateclose: > > > > > > > > > > > > [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1 > > > > > > [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true > > > > > > > > > > > > This output highlights the facts without implying causation between them. The > > > > > > tool(and my commit message) neither explains relationships between facts nor > > > > > > misleads users into assuming causation. > > > > > > > > > > For me your commit message does. > > > > > > > > Can you please provide some external support for your claims? > > > > > > > > I am very careful while writing, and I am aware of my own limitations. > > > > After reading what chatGPT wrote about my text, I felt comfortable > > > > about not misleading anyone. Please mind the wording of my > > > > question: I did not hint a desired outcome for chatGPT. > > > > > > > > My Question: > > > > -- // -- > > > > Can you please indicate any potential causality relationship > > > > that the following text may offer? > > > > > > > > Here is a real example from our CI shwoing hotreplug-lateclose > > > > changing the DRM card number and tainting the kernel on the > > > > abort path > > > > -- // -- > > > > > > > > chatGPT answer: > > > > --//-- > > > > The text suggests a potential causality relationship between the > > > > "hotreplug-lateclose" operation and its effects on the system: > > > > > > > > 1. Causal Event: The "hotreplug-lateclose" operation. > > > > This likely refers to a process in the CI(Continuous Integration) > > > > environment related to hotplugging devices and managing late-close > > > > scenarios. Hotplugging involves dynamically adding or removing > > > > hardware components without shutting down the system. > > > > > > > > 2. Effects: > > > > > > > > * Changing the DRM card number: This implies that the operation > > > > affects the enumeration or reordering of DRM (Direct Rendering > > > > Manager) devices, causing the card number to be reassigned. > > > > > > > > * Kernel Tainting on the abort path: Kernel tainting typically > > > > indicates that the kernel has entered a state that is not > > > > guaranteed to be stable or reliable. This could occur due > > > > to improper handling of errors or unexpected conditions > > > > during the "abort path" (likely a failure or cleanup process > > > > in the operation). > > > > > > > > Suggested Causal Chain: > > > > 1. The "hotreplug-lateclose" operation executes in the CI. > > > > 2. This causes a change in DRM card enumeration (potentially due > > > > to how devices are initialized or re-initialized). > > > > 3. On the abort path of this operation, a condition arises that > > > > taints the kernel (possibly due to resource cleanup issues, > > > > unhandled errors, or driver misbehavior). > > > > > > > > This text highlights a possible link between hotreplug-lateclose > > > > handling and instabilities in DRM device management as well as > > > > kernel state integrity. > > > > --//-- (Really not a fan of posting AI slop as an argument for how well other people can understand your position, especially if you ask it to perform any sort of reasoning...) > > > > > > > > > > > > > > > > > > Can you please provide a full list of "facts" your code is supposed to handle? > > > > > > > > This is in the commit message already, at the very begining. > > > > > > > > > Can you please explain why you selected just those "facts", not others? > > > > > > > > It was either what was missing, such as a convenient way to learn when > > > > something strange happend as a gpu disappearing from the PCI bus, or > > > > something that we believe may cause errors downstream such as a taint, > > > > and the list of loaded modules. > > > > > > > > For the drm card number association, we belive that there may be a caching > > > > issue: we are trying to figure it out if the drm-reopen cache handles the > > > > change of drm number association well. We have weak information pointing > > > > to a probable problem akin to missing cache invalidation. > > > > [...] > > > *What value do your facts add to CI reporting then?* > > > > It seems that the focus of your feedback has shifted from 'Peter is > > misleading people' to questioning the value of igt-runner. I believe > > it's important to address this change in perspective to ensure > > clarity and productive discussion. > > > > To address your new point, let me draw an analogy: asking "what is > > the value of igt-runner?" is similar to questioning the use of a car > > to reach the supermarket when walking is an option. While walking may > > suffice, the car adds efficiency and convenience, especially for > > larger or more complex "tasks". > > > > In our CI setup, a typical test list includes between 100 and > > 200 tests. As you rightly pointed out, determining which test changed > > a specific fact currently requires significant time and effort. When > > multiple tests modify different facts, the process is not practical. > > > > igt-facts adds value by: > > - Clearly identifying which tests change which facts. > > - Enabling us to know which facts were present and absent before the > > first test runs. > > - Reporting only changes, making the information concise and > > efficient to review. > > > > Moreover, for most developers, the igt-facts output will not be > > obtrusive. In our CI runs, the output is saved only to the > > igt_runner?? file and does not appear in dmesg or the results.json > > file. > > > > And even so, in the few hundred runs that I checked, igt-facts > > typically added between 3 and 16 lines of log to a file that has > > about 1000 lines. > > > > I hope this explanation helps clarify the purpose and value of > > igt-facts. > > Unfortunately not. I still can't see how your 4 facts reported by igt_runner > could be useful to me. But let's hear from those who share your points. Here's 3 reasons why they seem useful to me: - CI itself performs some of these checks in the pipelines (loaded modules, kernel taints before testing starts, likely more in the future), so this could simplify fetching this data if we just parsed the IGT lsfacts output. - Simplicity. I can take a quick look into the runner logs, see a bunch of successful tests, notice one of the tests left the system tainted and the GPU dropped off the bus - uh-huh, at least I can quickly filter down the interesting logs to what happened in tests near the reported fact changes. This is nothing revolutionary that I could not divine from dmesgs/outs/errs one way or another, but it sure is easy and convenient, especially for people who are not kernel developers but have to review and debug this stuff anyways. - It also makes it easier to notice passing tests that leave the system in an unclean state - for example, if a display drops off after a successful test and what follows does not immediately use displays, so some runs appear healthy, others less so (=> reasoning for noise runs). So yes, I'd like to see this framework in IGTs. It's not strictly necessary, but it would make our lives a little easier. Thanks, Ryszard > Thanks, > Janusz > > > > > Thank you, > > > > Peter > > > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH i-g-t] igt-runner fact checking @ 2024-11-02 11:37 Peter Senna Tschudin 2024-12-05 4:54 ` [PATCH i-g-t v10] " Peter Senna Tschudin 0 siblings, 1 reply; 13+ messages in thread From: Peter Senna Tschudin @ 2024-11-02 11:37 UTC (permalink / raw) To: igt-dev@lists.freedesktop.org; +Cc: Lucas De Marchi On igt-runner, before each test, and after the last test collect and report facts, such as: - hardware.pci.gpu_at_addr.0000:03:00.0 = 8086:e20b Intel Battlemage (Gen20) - hardware.pci.drm_card_at_addr.0000:03:00.0 = card1 - kernel.kmod_is_loaded.i915 = true Reports new, deleted and changed facts. Here is an example: $ cat test-list igt@core_hotunplug@hotrebind igt@xe_module_load@reload-no-display $ ... ./build/runner/igt_runner ... [51.225490] [FACT before any test] new hardware.pci.gpu_at_addr.0000:03:00.0 = 8086:e20b Intel Battlemage (Gen20) [51.225700] [FACT before any test] new hardware.pci.gpu_at_addr.0000:00:02.0 = 8086:a782 Intel Raptorlake_s (Gen12) [51.227334] [1/2] core_hotunplug (hotrebind) [59.429470] [FACT core_hotunplug (hotrebind)] new hardware.pci.drm_card_at_addr.0000:03:00.0 = card1 [59.429486] [FACT core_hotunplug (hotrebind)] new hardware.pci.drm_card_at_addr.0000:00:02.0 = card2 [59.430002] [FACT core_hotunplug (hotrebind)] new kernel.kmod_is_loaded.amdgpu = true [59.430011] [FACT core_hotunplug (hotrebind)] new kernel.kmod_is_loaded.i915 = true [59.430014] [FACT core_hotunplug (hotrebind)] new kernel.kmod_is_loaded.xe = true [59.430456] [2/2] xe_module_load (reload-no-display) [61.521348] [FACT xe_module_load (reload-no-display)] deleted hardware.pci.drm_card_at_addr.0000:03:00.0 = card1 [61.521834] [FACT xe_module_load (reload-no-display)] deleted kernel.kmod_is_loaded.xe = true Done. As igt@core_hotunplug@hotrebind makes changes to the system, running the same test-list again produces different results: $ ... ./build/runner/igt_runner ... [141.870776] [FACT before any test] new hardware.pci.gpu_at_addr.0000:03:00.0 = 8086:e20b Intel Battlemage (Gen20) [141.870820] [FACT before any test] new hardware.pci.gpu_at_addr.0000:00:02.0 = 8086:a782 Intel Raptorlake_s (Gen12) [141.872987] [FACT before any test] new hardware.pci.drm_card_at_addr.0000:00:02.0 = card2 [141.874137] [FACT before any test] new kernel.kmod_is_loaded.amdgpu = true [141.874144] [FACT before any test] new kernel.kmod_is_loaded.i915 = true [141.874596] [1/2] core_hotunplug (hotrebind) [146.662913] [FACT core_hotunplug (hotrebind)] changed hardware.pci.drm_card_at_addr.0000:00:02.0: card2 -> card0 [146.663850] [2/2] xe_module_load (reload-no-display) Done. TODO - Save logs to disk. Signed-off-by: Peter Senna Tschudin <peter.senna@linux.intel.com> --- lib/igt_facts.c | 443 ++++++++++++++++++++++++++++++++++++++++++++++ lib/igt_facts.h | 51 ++++++ lib/meson.build | 1 + runner/executor.c | 9 + 4 files changed, 504 insertions(+) create mode 100644 lib/igt_facts.c create mode 100644 lib/igt_facts.h diff --git a/lib/igt_facts.c b/lib/igt_facts.c new file mode 100644 index 000000000..90b362c5e --- /dev/null +++ b/lib/igt_facts.c @@ -0,0 +1,443 @@ +// SPDX-License-Identifier: MIT + +/* + * Copyright © 2024 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + */ + +#include <stdio.h> +#include <glib.h> +#include <libudev.h> +#include "igt_core.h" +#include "igt_kmod.h" +#include "igt_facts.h" +#include "igt_device_scan.h" + + +/* Report new facts and fact changes + * - new: new_value is NULL + * - change: new_value is not NULL + * - delete: delete is true + */ +static void report_fact_change(igt_fact *fact, const char *new_value, + const char *last_test, bool delete) +{ + struct timespec uptime_ts; + char *uptime = NULL; + + if (last_test == NULL) + last_test = g_strdup("before any test"); + + if (clock_gettime(CLOCK_BOOTTIME, &uptime_ts) != 0) + return; + + uptime = g_strdup_printf("%ld.%06ld", + uptime_ts.tv_sec, + uptime_ts.tv_nsec / 1000); + + /* If delete is true, the fact was deleted */ + if (delete) { + igt_info("[%s] [FACT %s] deleted %s = %s\n", + uptime, last_test, fact->name, fact->value); + return; + } + + /* If new_value is NULL, it is a new fact */ + if (new_value == NULL) { + igt_info("[%s] [FACT %s] new %s = %s\n", + uptime, last_test, fact->name, fact->value); + return; + } + + /* Check if the value changed */ + if (strcmp(fact->value, new_value) != 0) { + igt_info("[%s] [FACT %s] changed %s: %s -> %s\n", + uptime, last_test, fact->name, fact->value, new_value); + } + g_free(uptime); +} + +/* Get a fact by name. Return NULL if fact is not found. */ +static igt_fact *get_fact(igt_fact_list *list, const char *name) +{ + if (!list || list->facts == NULL || list->num_facts == 0) + return NULL; + + for (int i = 0; i < list->num_facts; i++) { + if (list->facts[i].name == NULL) + continue; + + if (strcmp(list->facts[i].name, name) == 0) + return &list->facts[i]; + } + return NULL; +} + +static bool delete_fact(igt_fact_list *list, const char *name, const char *last_test) +{ + igt_fact *fact = NULL; + int i; + + fact = get_fact(list, name); + if (fact == NULL) + return false; + + /* Report the deletion */ + report_fact_change(fact, NULL, last_test, true); + + /* Move all facts after the one to be deleted one step back */ + for (i = 0; i < list->num_facts; i++) { + if (strcmp(list->facts[i].name, name) == 0) { + g_free(list->facts[i].name); + g_free(list->facts[i].value); + + for (int j = i; j < list->num_facts - 1; j++) + list->facts[j] = list->facts[j + 1]; + break; + } + } + list->num_facts--; + list->facts = realloc(list->facts, list->num_facts * sizeof(igt_fact)); + + return true; +} + +/* Only used while creating the new_list. Not intended to add facts to the + * global fact list. + */ +static bool set_fact(igt_fact_list *list, const char *name, + const char *value, const char *last_test) +{ + igt_fact *fact = NULL; + + /* Check that name and value are not null */ + if (name == NULL || value == NULL) + return false; + + fact = get_fact(list, name); + if (fact != NULL) + return false; /* Should not happen */ + + /* Add a new fact */ + list->num_facts++; + list->facts = realloc(list->facts, list->num_facts * sizeof(igt_fact)); + list->facts[list->num_facts - 1].name = g_strdup(name); + list->facts[list->num_facts - 1].value = g_strdup(value); + + if (last_test) + list->facts[list->num_facts - 1].last_test = g_strdup(last_test); + else + list->facts[list->num_facts - 1].last_test = NULL; + + return true; +} + +/* Add a new fact or update the value of an existing fact. Return false if + * the value is the same as the existing one. + */ +static bool update_list(igt_fact_list *list, const char *name, + const char *value, const char *last_test) +{ + igt_fact *fact = NULL; + + /* Check that name and value are not null */ + if (name == NULL || value == NULL) + return false; + + fact = get_fact(list, name); + if (fact != NULL) { + /* Return false if fact->value equals value */ + if (strcmp(fact->value, value) == 0) + return false; /* Not changed */ + + report_fact_change(fact, value, last_test, false); + + /* Update the value */ + fact->value = g_strdup(value); + return true; + } + + /* Add a new fact */ + list->num_facts++; + list->facts = realloc(list->facts, list->num_facts * sizeof(igt_fact)); + list->facts[list->num_facts - 1].name = g_strdup(name); + list->facts[list->num_facts - 1].value = g_strdup(value); + + /* Report new fact */ + report_fact_change(&list->facts[list->num_facts - 1], NULL, + last_test, false); + + return true; +} + +/* Merge list and new_list. fact_group is used to filer entries on list + * that should be deleted. For example, if fact_group is "pci_gpu", we + * can delete the fact hardware.pci.gpu_at.0000:00:02.0 if it doesn't + * exist in new_list. + */ +static void merge_facts(igt_fact_list *list, igt_fact_list *new_list, + const char *fact_group, const char *last_test) +{ + /* Filter by fact_group and delete facts that exist on list + * but not on new_list + */ + for (int i = 0; i < list->num_facts; i++) { + if (strstr(list->facts[i].name, fact_group) == NULL) + continue; + if (get_fact(new_list, list->facts[i].name) == NULL) + delete_fact(list, list->facts[i].name, last_test); + } + + /* Add and update facts from new_list to list */ + for (int i = 0; i < new_list->num_facts; i++) { + if (strstr(new_list->facts[i].name, fact_group) == NULL) + continue; /* Should never happen */ + update_list(list, new_list->facts[i].name, new_list->facts[i].value, + new_list->facts[i].last_test); + } +} + +static void scan_pci_for_gpus(igt_fact_list *list, const char *last_test) +{ + struct udev *udev = NULL; + struct udev_enumerate *enumerate = NULL; + struct udev_list_entry *devices, *dev_list_entry; + struct igt_device_card card; + char pcistr[10]; + int ret; + char *factname = NULL; + char *factvalue = NULL; + igt_fact_list new_list = {0}; + + udev = udev_new(); + if (!udev) { + igt_warn("Failed to create udev context\n"); + return; + } + + enumerate = udev_enumerate_new(udev); + if (!enumerate) { + igt_warn("Failed to create udev enumerate\n"); + udev_unref(udev); + return; + } + + ret = udev_enumerate_add_match_subsystem(enumerate, "pci"); + if (ret < 0) + goto out; + + ret = udev_enumerate_add_match_property(enumerate, "PCI_CLASS", "30000"); + if (ret < 0) + goto out; + + ret = udev_enumerate_add_match_property(enumerate, "PCI_CLASS", "38000"); + if (ret < 0) + goto out; + + ret = udev_enumerate_scan_devices(enumerate); + if (ret < 0) + goto out; + + devices = udev_enumerate_get_list_entry(enumerate); + if (!devices) + goto out; + + udev_list_entry_foreach(dev_list_entry, devices) { + const char *path; + struct udev_device *udev_dev; + struct udev_list_entry *entry; + char *model = NULL; + char *codename = NULL; + + path = udev_list_entry_get_name(dev_list_entry); + udev_dev = udev_device_new_from_syspath(udev, path); + if (!udev_dev) + continue; + + /* Strip path to only the content after the last / */ + path = strrchr(path, '/'); + if (path) + path++; + else + path = "unknown"; + + strcpy(card.pci_slot_name, "-"); + + entry = udev_device_get_properties_list_entry(udev_dev); + while (entry) { + const char *name = udev_list_entry_get_name(entry); + const char *value = udev_list_entry_get_value(entry); + + entry = udev_list_entry_get_next(entry); + if (!strcmp(name, "ID_MODEL_FROM_DATABASE")) + model = g_strdup(value); + else if (!strcmp(name, "PCI_ID")) + igt_assert_eq(sscanf(value, "%hx:%hx", + &card.pci_vendor, &card.pci_device), 2); + } + snprintf(pcistr, sizeof(pcistr), "%04x:%04x", + card.pci_vendor, card.pci_device); + codename = igt_device_get_pretty_name(&card, false); + + factname = g_strdup_printf("%s.%s", pci_gpu_fact, path); + factvalue = g_strdup_printf("%s %s %s", + pcistr, + codename ? codename : "", + model ? model : ""); + + set_fact(&new_list, factname, factvalue, last_test); + + free(codename); + free(model); + udev_device_unref(udev_dev); + } + merge_facts(list, &new_list, pci_gpu_fact, last_test); + +out: + udev_enumerate_unref(enumerate); + udev_unref(udev); +} + +static void scan_pci_drm_cards(igt_fact_list *list, const char *last_test) +{ + struct udev *udev = NULL; + struct udev_enumerate *enumerate = NULL; + struct udev_list_entry *devices, *dev_list_entry; + int ret; + char *factname = NULL; + char *factvalue = NULL; + igt_fact_list new_list = {0}; + + udev = udev_new(); + if (!udev) + return; + + enumerate = udev_enumerate_new(udev); + if (!enumerate) { + udev_unref(udev); + return; + } + + ret = udev_enumerate_add_match_subsystem(enumerate, "drm"); + if (ret < 0) + goto out; + + ret = udev_enumerate_scan_devices(enumerate); + if (ret < 0) + goto out; + + devices = udev_enumerate_get_list_entry(enumerate); + if (!devices) + goto out; + + ret = udev_enumerate_add_match_subsystem(enumerate, "drm"); + if (ret < 0) + goto out; + + ret = udev_enumerate_scan_devices(enumerate); + if (ret < 0) + goto out; + + devices = udev_enumerate_get_list_entry(enumerate); + if (!devices) + goto out; + + udev_list_entry_foreach(dev_list_entry, devices) { + const char *path; + struct udev_device *drm_dev, *pci_dev; + const char *drm_name, *pci_addr; + + path = udev_list_entry_get_name(dev_list_entry); + drm_dev = udev_device_new_from_syspath(udev, path); + if (!drm_dev) + continue; + + drm_name = udev_device_get_sysname(drm_dev); + /* Filter the device by name. Want devices such as card0 and card1. + * If the device has '-' in the name, contine + */ + if (strncmp(drm_name, "card", 4) != 0 || strchr(drm_name, '-') != NULL) { + udev_device_unref(drm_dev); + continue; + } + + /* Get the pci address of the gpu associated with the drm_dev*/ + pci_dev = udev_device_get_parent_with_subsystem_devtype(drm_dev, "pci", NULL); + if (pci_dev) { + pci_addr = udev_device_get_sysattr_value(pci_dev, "address"); + if (!pci_addr) + pci_addr = udev_device_get_sysname(pci_dev); + } else { + /* Some GPUs are platform devices. Ignore them. */ + pci_addr = NULL; + udev_device_unref(drm_dev); + continue; + } + + factname = g_strdup_printf("%s.%s", drm_card_fact, pci_addr); + factvalue = g_strdup_printf("%s", drm_name); + set_fact(&new_list, factname, factvalue, last_test); + + udev_device_unref(drm_dev); + } + if (new_list.num_facts > 0) + merge_facts(list, &new_list, drm_card_fact, last_test); + +out: + udev_enumerate_unref(enumerate); + udev_unref(udev); +} + +static void scan_kernel_loaded_kmods(igt_fact_list *list, const char *last_test) +{ + char *name = NULL; + igt_fact_list new_list = {0}; + + /* Iterate over igt_fact_kmod_list[] until the element contains "\0" */ + for (int i = 0; strcmp(igt_fact_kmod_list[i], "\0") != 0; i++) { + name = g_strdup_printf("%s.%s", kmod_fact, igt_fact_kmod_list[i]); + if (igt_kmod_is_loaded(igt_fact_kmod_list[i])) + set_fact(&new_list, name, "true", last_test); + + g_free(name); + } + merge_facts(list, &new_list, kmod_fact, last_test); +} + +void igt_facts(igt_fact_list *list, const char *last_test) +{ + scan_pci_for_gpus(list, last_test); + scan_pci_drm_cards(list, last_test); + scan_kernel_loaded_kmods(list, last_test); + + /* Flush stdout and stderr */ + fflush(stdout); + fflush(stderr); +} + +/* print_all_facts: pretty print all facts */ +void print_all_facts(igt_fact_list *list) +{ + igt_info("Number of facts: %d\n", list->num_facts); + for (int i = 0; i < list->num_facts; i++) + igt_info(" - %s: %s\n", list->facts[i].name, list->facts[i].value); +} diff --git a/lib/igt_facts.h b/lib/igt_facts.h new file mode 100644 index 000000000..289530cce --- /dev/null +++ b/lib/igt_facts.h @@ -0,0 +1,51 @@ +/* SPDX-License-Identifier: MIT + * + * Copyright © 2024 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + */ + +typedef struct { + char *name; + char *value; + char *last_test; +} igt_fact; + +typedef struct { + igt_fact *facts; + int num_facts; +} igt_fact_list; + +const char *igt_fact_kmod_list[] = { + "amdgpu", + "i915", + "nouveau", + "radeon", + "xe", + "\0" +}; + +const char *kmod_fact = "kernel.kmod_is_loaded"; /* true or false */ +const char *pci_gpu_fact = "hardware.pci.gpu_at_addr"; /* id vendor model */ +const char *drm_card_fact = "hardware.pci.drm_card_at_addr"; /* cardX */ + +void igt_facts(igt_fact_list *list, const char *last_test); +void print_all_facts(igt_fact_list *list); diff --git a/lib/meson.build b/lib/meson.build index c3556a921..c44ca2b5a 100644 --- a/lib/meson.build +++ b/lib/meson.build @@ -18,6 +18,7 @@ lib_sources = [ 'i915/i915_crc.c', 'igt_collection.c', 'igt_color_encoding.c', + 'igt_facts.c', 'igt_crc.c', 'igt_debugfs.c', 'igt_device.c', diff --git a/runner/executor.c b/runner/executor.c index ac73e1dde..6ff252174 100644 --- a/runner/executor.c +++ b/runner/executor.c @@ -35,6 +35,7 @@ #include "executor.h" #include "output_strings.h" #include "runnercomms.h" +#include "igt_facts.h" #define KMSG_HEADER "[IGT] " #define KMSG_WARN 4 @@ -2306,6 +2307,8 @@ bool execute(struct execute_state *state, sigset_t sigmask; double time_spent = 0.0; bool status = true; + igt_fact_list fact_list = {0}; + char *last_test = NULL; if (state->dry) { outf("Dry run, not executing. Invoke igt_resume if you want to execute.\n"); @@ -2438,6 +2441,10 @@ bool execute(struct execute_state *state, int result; bool already_written = false; + /* Calls before running each test */ + igt_facts(&fact_list, last_test); + last_test = entry_display_name(&job_list->entries[state->next]); + if (should_die_because_signal(sigfd)) { status = false; goto end; @@ -2526,6 +2533,8 @@ bool execute(struct execute_state *state, return execute(state, settings, job_list); } } + /* Last call to collect facts after the last test runs */ + igt_facts(&fact_list, last_test); if ((timefd = openat(resdirfd, "endtime.txt", O_CREAT | O_WRONLY | O_EXCL, 0666)) >= 0) { dprintf(timefd, "%f\n", timeofday_double()); -- 2.34.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH i-g-t v10] igt-runner fact checking 2024-11-02 11:37 [PATCH i-g-t] " Peter Senna Tschudin @ 2024-12-05 4:54 ` Peter Senna Tschudin 2024-12-05 9:08 ` Piatkowski, Dominik Karol 2024-12-06 11:42 ` Kamil Konieczny 0 siblings, 2 replies; 13+ messages in thread From: Peter Senna Tschudin @ 2024-12-05 4:54 UTC (permalink / raw) To: igt-dev@lists.freedesktop.org Cc: Ryszard Knop, Janusz Krzysztofik, Zbigniew Kempczyński, Lucas De Marchi, luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray, dominik.karol.piatkowski, katarzyna.piecielska When using igt-runner, collect facts before each test and after the last test, and report when facts change. The facts are: - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20) - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1 - Kernel taints: kernel.is_tainted.taint_warn: true - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true This change imposes little execution overhead and adds just a few lines of logging. The facts will be printed on normal igt-runner output. Here is a real example from our CI showing hotreplug-lateclose changing the DRM card number and tainting the kernel on the abort path: [245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose) [245.383596] Starting subtest: hotreplug-lateclose [249.843361] Aborting: Lockdep not active [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1 [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true [249.859075] Closing watchdogs CC: Ryszard Knop <ryszard.knop@intel.com> CC: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> CC: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> CC: Lucas De Marchi <lucas.demarchi@intel.com> CC: luciano.coelho@intel.com CC: nirmoy.das@intel.com CC: stuart.summers@intel.com CC: himal.prasad.ghimiray@intel.com CC: dominik.karol.piatkowski@intel.com CC: katarzyna.piecielska@intel.com Signed-off-by: Peter Senna Tschudin <peter.senna@linux.intel.com> --- Re-sending as the mailing list server rejected the message yesterday due to a probably full disk... v10: - fix memory leaks from asprintf (Thank you Dominik Karol!) - fix comments for consistency (Thank you Dominik Karol!) v9: - do not report new hardware when loading/unloading kmod changes the string of the GPU name. I accidentally reintroduced this issue when refactoring to use linked lists. - add tools/lsfacts: 9 lines of code that print either the facts or that no facts were found. - fix code comments describing functions - fix white space issues v8: - fix white space issues v7: - refactor to use linked lists provided by igt_lists - Added function arguments to code comments - updated commit message v6: - sort includes in igt_facts.c alphabetically - add facts for kernel taints using igt_kernel_tainted() and igt_explain_taints() v5: - fix the broken patch format from v4 v4: - fix a bug on delete_fact() - drop glib and calls to g_ functions - change commit message to indicate that report only on fact changes - use consistent format for reporting changes - fix SPDX header format v3: - refreshed commit message - changed format SPDX string - removed license text - replace last_test assignment when null by two ternary operators - added function descriptions following example found elsewhere in the code - added igt_assert to catch failures to realloc() v2: - add lib/tests/igt_facts.c for basic unit testing - bugfix: do not report a new gpu when the driver changes the gpu name - bugfix: do not report the pci_id twice on the gpu name lib/igt_facts.c | 755 ++++++++++++++++++++++++++++++++++++++++++ lib/igt_facts.h | 47 +++ lib/meson.build | 1 + lib/tests/igt_facts.c | 15 + lib/tests/meson.build | 1 + runner/executor.c | 10 + tools/lsfacts.c | 25 ++ tools/meson.build | 1 + 8 files changed, 855 insertions(+) create mode 100644 lib/igt_facts.c create mode 100644 lib/igt_facts.h create mode 100644 lib/tests/igt_facts.c create mode 100644 tools/lsfacts.c diff --git a/lib/igt_facts.c b/lib/igt_facts.c new file mode 100644 index 000000000..4749d3417 --- /dev/null +++ b/lib/igt_facts.c @@ -0,0 +1,755 @@ +// SPDX-License-Identifier: MIT +// Copyright © 2024 Intel Corporation + +#include <ctype.h> +#include <libudev.h> +#include <stdio.h> +#include <sys/time.h> +#include <time.h> + +#include "igt_core.h" +#include "igt_device_scan.h" +#include "igt_facts.h" +#include "igt_kmod.h" +#include "igt_list.h" +#include "igt_taints.h" + +static struct igt_list_head igt_facts_list_drm_card_head; +static struct igt_list_head igt_facts_list_kmod_head; +static struct igt_list_head igt_facts_list_ktaint_head; +static struct igt_list_head igt_facts_list_pci_gpu_head; + + +/** + * igt_facts_lists_init: + * + * Initialize igt_facts linked lists. + * + * Returns: void + */ +void igt_facts_lists_init(void) +{ + IGT_INIT_LIST_HEAD(&igt_facts_list_drm_card_head); + IGT_INIT_LIST_HEAD(&igt_facts_list_kmod_head); + IGT_INIT_LIST_HEAD(&igt_facts_list_ktaint_head); + IGT_INIT_LIST_HEAD(&igt_facts_list_pci_gpu_head); +} + + +/** + * igt_facts_log: + * @last_test: name of the test that triggered the fact + * @name: name of the fact + * @new_value: new value of the fact + * @old_value: old value of the fact + * + * Reports fact changes: + * - new fact: if old_value is NULL and new_value is not NULL + * - deleted fact: if new_value is NULL and old_value is not NULL + * - changed fact: if new_value is different from old_value + * + * Returns: void + */ +static void igt_facts_log(const char *last_test, const char *name, + const char *new_value, const char *old_value) +{ + struct timespec uptime_ts; + char *uptime = NULL; + const char *before_tests = "before any test"; + + if (old_value == NULL && new_value == NULL) + return; + + if (clock_gettime(CLOCK_BOOTTIME, &uptime_ts) != 0) + return; + + asprintf(&uptime, + "%ld.%06ld", + uptime_ts.tv_sec, + uptime_ts.tv_nsec / 1000); + + /* New fact */ + if (old_value == NULL && new_value != NULL) { + igt_info("[%s] [FACT %s] new: %s: %s\n", + uptime, + last_test ? last_test : before_tests, + name, + new_value); + goto out; + } + + /* Update fact */ + if (old_value != NULL && new_value != NULL) { + igt_info("[%s] [FACT %s] changed: %s: %s -> %s\n", + uptime, + last_test ? last_test : before_tests, + name, + old_value, + new_value); + goto out; + } + + /* Deleted fact */ + if (old_value != NULL && new_value == NULL) { + igt_info("[%s] [FACT %s] deleted: %s: %s\n", + uptime, + last_test ? last_test : before_tests, + name, + old_value); + goto out; + } + +out: + free(uptime); +} + +/** + * igt_facts_list_get: + * @name: name of the fact to be added + * @head: head of the list + * + * Get a fact from the list. + * + * Returns: pointer to the fact if found, NULL otherwise + * + */ +static igt_fact *igt_facts_list_get(const char *name, + struct igt_list_head *head) +{ + igt_fact *fact = NULL; + + if (igt_list_empty(head)) + return NULL; + + igt_list_for_each_entry(fact, head, link) { + if (strcmp(fact->name, name) == 0) + return fact; + } + return NULL; +} + +/** + * igt_facts_list_del: + * @name: name of the fact to be added + * @head: head of the list + * @last_test: name of the last test + * @log: bool indicating if the delete operation should be logged + * + * Delete a fact from the list. + * + * Returns: bool indicating if fact was deleted from the list + * + */ +static bool igt_facts_list_del(const char *name, + struct igt_list_head *head, + const char *last_test, + bool log) +{ + igt_fact *fact = NULL; + + if (igt_list_empty(head)) + return false; + + igt_list_for_each_entry(fact, head, link) { + if (strcmp(fact->name, name) == 0) { + if (log) + igt_facts_log(last_test, fact->name, + NULL, fact->value); + + igt_list_del(&fact->link); + free(fact->name); + free(fact->value); + free(fact->last_test); + free(fact); + return true; + } + } + return false; +} + +/** + * igt_facts_list_add: + * @name: name of the fact to be added + * @value: value of the fact to be added + * @last_test: name of the last test + * @head: head of the list + * + * Returns: bool indicating if fact was added to the list + * + */ +static bool igt_facts_list_add(const char *name, + const char *value, + const char *last_test, + struct igt_list_head *head) +{ + igt_fact *new_fact = NULL, *old_fact = NULL; + bool logged = false; + + if (name == NULL || value == NULL) + return false; + + old_fact = igt_facts_list_get(name, head); + if (old_fact) { + if (strcmp(old_fact->value, value) == 0) { + old_fact->present = true; + return false; + } + igt_facts_log(last_test, name, value, old_fact->value); + logged = true; + igt_facts_list_del(name, head, last_test, false); + } + + new_fact = malloc(sizeof(igt_fact)); + if (new_fact == NULL) + return false; + + new_fact->name = strdup(name); + new_fact->value = strdup(value); + new_fact->last_test = last_test ? strdup(last_test) : NULL; + new_fact->present = true; + + if (!logged) + igt_facts_log(last_test, name, value, NULL); + + igt_list_add(&new_fact->link, head); + + return true; +} + +/** + * igt_facts_list_mark: + * @head: head of the list + * + * Mark all facts in the list as not present. Opted for the mark and sweep + * design pattern due to its simplicity and efficiency. + * + * Returns: void + */ +static void igt_facts_list_mark(struct igt_list_head *head) +{ + igt_fact *fact = NULL; + + if (igt_list_empty(head)) + return; + + igt_list_for_each_entry(fact, head, link) + fact->present = false; +} + +/** + * igt_facts_list_sweep: + * @head: head of the list + * @last_test: name of the last test + * + * Sweep the list and delete all facts that are not present. Opted for the mark + * and sweep design pattern due to its simplicity and efficiency. + * + * Returns: void + */ +static void igt_facts_list_sweep(struct igt_list_head *head, + const char *last_test) +{ + igt_fact *fact = NULL, *tmp = NULL; + + if (igt_list_empty(head)) + return; + + igt_list_for_each_entry_safe(fact, tmp, head, link) + if (!fact->present) + igt_facts_list_del(fact->name, head, last_test, true); +} + +/** + * igt_facts_list_mark_and_sweep: + * @head: head of the list + * + * Clean up the list using mark and sweep. Opted for the mark and sweep + * design pattern due to its simplicity and efficiency. + * + * Returns: void + */ +static void igt_facts_list_mark_and_sweep(struct igt_list_head *head) +{ + igt_facts_list_mark(head); + igt_facts_list_sweep(head, NULL); +} + +/** + * igt_facts_are_all_lists_empty: + * + * Returns true if all lists are empty. Used by the tool lsfacts. + * + * Returns: bool + */ +bool igt_facts_are_all_lists_empty(void) +{ + return igt_list_empty(&igt_facts_list_drm_card_head) && + igt_list_empty(&igt_facts_list_kmod_head) && + igt_list_empty(&igt_facts_list_ktaint_head) && + igt_list_empty(&igt_facts_list_pci_gpu_head); +} + +/** + * igt_facts_scan_pci_gpus: + * @last_test: name of the last test + * + * This function scans the pci bus for gpus using udev. It uses + * igt_facts_list_mark(), igt_facts_list_add() and igt_facts_list_sweep() to + * update igt_facts_list_pci_gpu_head. + * + * Returns: void + */ +static void igt_facts_scan_pci_gpus(const char *last_test) +{ + static struct igt_list_head *head = &igt_facts_list_pci_gpu_head; + struct udev *udev = NULL; + struct udev_enumerate *enumerate = NULL; + struct udev_list_entry *devices, *dev_list_entry; + struct igt_device_card card; + char pcistr[10]; + int ret; + char *factname = NULL; + char *factvalue = NULL; + + udev = udev_new(); + if (!udev) { + igt_warn("Failed to create udev context\n"); + return; + } + + enumerate = udev_enumerate_new(udev); + if (!enumerate) { + igt_warn("Failed to create udev enumerate\n"); + udev_unref(udev); + return; + } + + ret = udev_enumerate_add_match_subsystem(enumerate, "pci"); + if (ret < 0) + goto out; + + ret = udev_enumerate_add_match_property(enumerate, + "PCI_CLASS", + "30000"); + if (ret < 0) + goto out; + + ret = udev_enumerate_add_match_property(enumerate, + "PCI_CLASS", + "38000"); + if (ret < 0) + goto out; + + ret = udev_enumerate_scan_devices(enumerate); + if (ret < 0) + goto out; + + devices = udev_enumerate_get_list_entry(enumerate); + if (!devices) + goto out; + + igt_facts_list_mark(head); + + udev_list_entry_foreach(dev_list_entry, devices) { + const char *path; + struct udev_device *udev_dev; + struct udev_list_entry *entry; + char *model = NULL; + char *codename = NULL; + igt_fact *old_fact = NULL; + + path = udev_list_entry_get_name(dev_list_entry); + udev_dev = udev_device_new_from_syspath(udev, path); + if (!udev_dev) + continue; + + /* Strip path to only the content after the last / */ + path = strrchr(path, '/'); + if (path) + path++; + else + path = "unknown"; + + strcpy(card.pci_slot_name, "-"); + + entry = udev_device_get_properties_list_entry(udev_dev); + while (entry) { + const char *name = udev_list_entry_get_name(entry); + const char *value = udev_list_entry_get_value(entry); + + entry = udev_list_entry_get_next(entry); + if (!strcmp(name, "ID_MODEL_FROM_DATABASE")) + model = strdup(value); + else if (!strcmp(name, "PCI_ID")) + igt_assert_eq(sscanf(value, "%hx:%hx", + &card.pci_vendor, + &card.pci_device), 2); + } + snprintf(pcistr, sizeof(pcistr), "%04x:%04x", + card.pci_vendor, card.pci_device); + codename = igt_device_get_pretty_name(&card, false); + + /* Set codename to null if it is the same string as pci_id */ + if (codename && strcmp(pcistr, codename) == 0) { + free(codename); + codename = NULL; + } + asprintf(&factname, "%s.%s", pci_gpu_fact, path); + asprintf(&factvalue, + "%s %s %s", + pcistr, + codename ? codename : "", + model ? model : ""); + + /** + * Loading and unloading the kmod may change the human + * readeable string in value. Do not change value if the + * pci id is the same. + */ + old_fact = igt_facts_list_get(factname, head); + if (old_fact && strncmp(old_fact->value, factvalue, 9) == 0) + old_fact->present = true; + else + igt_facts_list_add(factname, factvalue, last_test, head); + + free(codename); + free(model); + free(factname); + free(factvalue); + udev_device_unref(udev_dev); + } + + igt_facts_list_sweep(head, last_test); + +out: + udev_enumerate_unref(enumerate); + udev_unref(udev); +} + +/** + * igt_facts_scan_pci_drm_cards: + * @last_test: name of the last test + * + * This function scans the pci bus for drm cards using udev. It uses the + * igt_facts_list_mark(), igt_facts_list_add() and igt_facts_list_sweep() to + * update igt_facts_list_drm_card_head. + * + * Returns: void + */ +static void igt_facts_scan_pci_drm_cards(const char *last_test) +{ + static struct igt_list_head *head = &igt_facts_list_drm_card_head; + struct udev *udev = NULL; + struct udev_enumerate *enumerate = NULL; + struct udev_list_entry *devices, *dev_list_entry; + int ret; + char *factname = NULL; + char *factvalue = NULL; + + udev = udev_new(); + if (!udev) + return; + + enumerate = udev_enumerate_new(udev); + if (!enumerate) { + udev_unref(udev); + return; + } + + ret = udev_enumerate_add_match_subsystem(enumerate, "drm"); + if (ret < 0) + goto out; + + ret = udev_enumerate_scan_devices(enumerate); + if (ret < 0) + goto out; + + devices = udev_enumerate_get_list_entry(enumerate); + if (!devices) + goto out; + + ret = udev_enumerate_add_match_subsystem(enumerate, "drm"); + if (ret < 0) + goto out; + + ret = udev_enumerate_scan_devices(enumerate); + if (ret < 0) + goto out; + + devices = udev_enumerate_get_list_entry(enumerate); + if (!devices) + goto out; + + igt_facts_list_mark(head); + + udev_list_entry_foreach(dev_list_entry, devices) { + const char *path; + struct udev_device *drm_dev, *pci_dev; + const char *drm_name, *pci_addr; + + path = udev_list_entry_get_name(dev_list_entry); + drm_dev = udev_device_new_from_syspath(udev, path); + if (!drm_dev) + continue; + + drm_name = udev_device_get_sysname(drm_dev); + /* Filter the device by name. Want devices such as card0 and card1. + * If the device has '-' in the name, contine + */ + if (strncmp(drm_name, "card", 4) != 0 || + strchr(drm_name, '-') != NULL) { + udev_device_unref(drm_dev); + continue; + } + + /* Get the pci address of the gpu associated with the drm_dev*/ + pci_dev = udev_device_get_parent_with_subsystem_devtype(drm_dev, + "pci", + NULL); + if (pci_dev) { + pci_addr = udev_device_get_sysattr_value(pci_dev, + "address"); + if (!pci_addr) + pci_addr = udev_device_get_sysname(pci_dev); + } else { + /* Some GPUs are platform devices. Ignore them. */ + pci_addr = NULL; + udev_device_unref(drm_dev); + continue; + } + + asprintf(&factname, "%s.%s", drm_card_fact, pci_addr); + asprintf(&factvalue, "%s", drm_name); + + igt_facts_list_add(factname, factvalue, last_test, head); + + free(factname); + free(factvalue); + udev_device_unref(drm_dev); + } + + igt_facts_list_sweep(head, last_test); + +out: + udev_enumerate_unref(enumerate); + udev_unref(udev); +} + +/** + * igt_facts_scan_kernel_taints: + * @last_test: name of the last test + * + * This function scans for kernel taints using igt_kernel_tainted() and + * igt_explain_taints(). It will cut off the explanation keeping only the + * taint name. + * + * Returns: void + */ +static void igt_facts_scan_kernel_taints(const char *last_test) +{ + static struct igt_list_head *head = &igt_facts_list_ktaint_head; + unsigned long taints = 0; + const char *reason = NULL; + char *taint_name = NULL; + char *fact_name = NULL; + + taints = igt_kernel_tainted(&taints); + /* For testing, set all bits to 1 + * taints = 0xFFFFFFFF; + */ + + + igt_facts_list_mark(head); + + while ((reason = igt_explain_taints(&taints)) != NULL) { + /* Cut at the ':' to get only the taint name */ + taint_name = strtok(strdup(reason), ":"); + if (!taint_name) + continue; + + /* Lowercase taint_name */ + for (int i = 0; taint_name[i]; i++) + taint_name[i] = tolower(taint_name[i]); + + asprintf(&fact_name, "%s.%s", ktaint_fact, taint_name); + igt_facts_list_add(fact_name, "true", last_test, head); + + free(taint_name); + free(fact_name); + } + + igt_facts_list_sweep(head, last_test); +} + + +/** + * igt_facts_scan_kernel_loaded_kmods: + * @last_test: name of the last test + * + * This function scans for loaded kmods using igt_fact_kmod_list and + * igt_kmod_is_loaded(). + * + * Returns: void + */ +static void igt_facts_scan_kernel_loaded_kmods(const char *last_test) +{ + static struct igt_list_head *head = &igt_facts_list_kmod_head; + char *name = NULL; + + igt_facts_list_mark(head); + + /* Iterate over igt_fact_kmod_list[] until the element contains "\0" */ + for (int i = 0; strcmp(igt_fact_kmod_list[i], "\0") != 0; i++) { + asprintf(&name, "%s.%s", kmod_fact, igt_fact_kmod_list[i]); + if (igt_kmod_is_loaded(igt_fact_kmod_list[i])) + igt_facts_list_add(name, "true", last_test, head); + + free(name); + } + + igt_facts_list_sweep(head, last_test); +} + +/** + * igt_facts: + * @last_test: name of the last test + * + * Call this function where you want to gather and report facts. + * + * Returns: void + */ +void igt_facts(const char *last_test) +{ + igt_facts_scan_pci_gpus(last_test); + igt_facts_scan_pci_drm_cards(last_test); + igt_facts_scan_kernel_taints(last_test); + igt_facts_scan_kernel_loaded_kmods(last_test); + + fflush(stdout); + fflush(stderr); +} + +/* + * Testing + * + * Defined here to keep most of the functions static + * + */ + +/** + * igt_facts_test_add_get: + * @head: head of the list + * + * Tests igt_facts_list_add and igt_facts_list_get. + * + * Returns: void + */ +static void igt_facts_test_add_get(struct igt_list_head *head) +{ + igt_fact *fact = NULL; + bool ret; + const char *name = "hardware.pci.gpu_at_addr.0000:00:02.0"; + const char *value = "8086:64a0 Intel Lunarlake (Gen20)"; + const char *last_test = NULL; + + ret = igt_facts_list_add(name, value, last_test, head); + igt_assert(ret == true); + + // Assert that there is one element in the linked list + igt_assert_eq(igt_list_length(head), 1); + + // Assert that the element in the linked list is the one we added + fact = igt_facts_list_get(name, head); + igt_assert(fact != NULL); + igt_assert_eq(strcmp(fact->name, name), 0); + igt_assert_eq(strcmp(fact->value, value), 0); + igt_assert(fact->present == true); + igt_assert(fact->last_test == NULL); +} + +/** + * igt_facts_test_mark_and_sweep: + * @head: head of the list + * + * - Add 3 elements to the list and mark them as not present. + * - Update two of the elements and mark them as present. + * - Sweep the list and assert that + * - Only the two updated elements are present + * - The third element was deleted + * + * Returns: void + */ +static void igt_facts_test_mark_and_sweep(struct igt_list_head *head) +{ + igt_fact *fact = NULL; + const char *name1 = "hardware.pci.gpu_at_addr.0000:00:02.0"; + const char *value1 = "8086:64a0 Intel Lunarlake (Gen20)"; + const char *name2 = "hardware.pci.gpu_at_addr.0000:00:03.0"; + const char *value2 = "8086:64a1 Intel Lunarlake (Gen21)"; + const char *name3 = "hardware.pci.gpu_at_addr.0000:00:04.0"; + const char *value3 = "8086:64a2 Intel Lunarlake (Gen22)"; + + igt_facts_list_add(name1, value1, NULL, head); + igt_facts_list_add(name2, value2, NULL, head); + igt_facts_list_add(name3, value3, NULL, head); + + igt_facts_list_mark(head); + + igt_facts_list_add(name1, value1, NULL, head); + igt_facts_list_add(name2, value2, NULL, head); + + igt_facts_list_sweep(head, NULL); + + // Assert that there are two elements in the linked list + igt_assert_eq(igt_list_length(head), 2); + + // Assert that the two updated elements are present + fact = igt_facts_list_get(name1, head); + igt_assert(fact != NULL); + igt_assert(fact->present == true); + + fact = igt_facts_list_get(name2, head); + igt_assert(fact != NULL); + igt_assert(fact->present == true); + + // Assert that the third element was deleted + fact = igt_facts_list_get(name3, head); + igt_assert(fact == NULL); +} + +/** + * igt_facts_test: + * + * Main function for testing the igt_facts module + * + * Returns: bool indicating if the tests passed + */ +void igt_facts_test(void) +{ + const char *last_test = "Unit Testing"; + + igt_facts_lists_init(); + + /* Assert that all lists are empty */ + igt_assert(igt_list_empty(&igt_facts_list_kmod_head)); + igt_assert(igt_list_empty(&igt_facts_list_ktaint_head)); + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head)); + igt_assert(igt_list_empty(&igt_facts_list_drm_card_head)); + + /* Assert that add and get work. Will add one element to the list */ + igt_facts_test_add_get(&igt_facts_list_pci_gpu_head); + + /* Assert that igt_facts_list_mark_and_sweep() cleans up the list */ + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == false); + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head); + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == true); + + /* Test the mark and sweep pattern used to delete elements + * from the list + */ + igt_facts_test_mark_and_sweep(&igt_facts_list_pci_gpu_head); + + /* Clean up the list and call igt_facts(). This should not crash */ + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head); + igt_facts(last_test); +} diff --git a/lib/igt_facts.h b/lib/igt_facts.h new file mode 100644 index 000000000..e4adca3fb --- /dev/null +++ b/lib/igt_facts.h @@ -0,0 +1,47 @@ +/* SPDX-License-Identifier: MIT + * Copyright © 2024 Intel Corporation + */ + +#include <stdbool.h> + +#include "igt_list.h" + + +/* igt_fact: + * @name: name of the fact + * @value: value of the fact + * @last_test: name of the test that triggered the fact + * @present: bool indicating if fact is present. Used for deleting facts from + * the list. + * @link: link to the next fact + * + * A fact is a piece of information that can be used to determine the state of + * the system. + * + */ +typedef struct { + char *name; + char *value; + char *last_test; + bool present; /* For mark and seep */ + struct igt_list_head link; +} igt_fact; + +const char *igt_fact_kmod_list[] = { + "amdgpu", + "i915", + "nouveau", + "radeon", + "xe", + "\0" +}; + +const char *kmod_fact = "kernel.kmod_is_loaded"; /* true or false */ +const char *ktaint_fact = "kernel.is_tainted"; /* taint name: taint_warn */ +const char *pci_gpu_fact = "hardware.pci.gpu_at_addr"; /* id vendor model */ +const char *drm_card_fact = "hardware.pci.drm_card_at_addr"; /* cardX */ + +void igt_facts_lists_init(void); +void igt_facts(const char *last_test); +bool igt_facts_are_all_lists_empty(void); +void igt_facts_test(void); /* For unit testing only */ diff --git a/lib/meson.build b/lib/meson.build index c3556a921..c44ca2b5a 100644 --- a/lib/meson.build +++ b/lib/meson.build @@ -18,6 +18,7 @@ lib_sources = [ 'i915/i915_crc.c', 'igt_collection.c', 'igt_color_encoding.c', + 'igt_facts.c', 'igt_crc.c', 'igt_debugfs.c', 'igt_device.c', diff --git a/lib/tests/igt_facts.c b/lib/tests/igt_facts.c new file mode 100644 index 000000000..7fa9d0f22 --- /dev/null +++ b/lib/tests/igt_facts.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: MIT +// Copyright © 2024 Intel Corporation + +#include <stdbool.h> + +#include "igt_core.h" +#include "igt_facts.h" + +/* Tests are not defined here so we can keep most of the functions static */ + +igt_simple_main +{ + igt_info("Running igt_facts_test\n"); + igt_facts_test(); +} diff --git a/lib/tests/meson.build b/lib/tests/meson.build index df8092638..1ce19f63c 100644 --- a/lib/tests/meson.build +++ b/lib/tests/meson.build @@ -8,6 +8,7 @@ lib_tests = [ 'igt_dynamic_subtests', 'igt_edid', 'igt_exit_handler', + 'igt_facts', 'igt_fork', 'igt_fork_helper', 'igt_hook', diff --git a/runner/executor.c b/runner/executor.c index ac73e1dde..d1eca3c05 100644 --- a/runner/executor.c +++ b/runner/executor.c @@ -30,6 +30,7 @@ #include "igt_aux.h" #include "igt_core.h" +#include "igt_facts.h" #include "igt_taints.h" #include "igt_vec.h" #include "executor.h" @@ -2306,6 +2307,9 @@ bool execute(struct execute_state *state, sigset_t sigmask; double time_spent = 0.0; bool status = true; + char *last_test = NULL; + + igt_facts_lists_init(); if (state->dry) { outf("Dry run, not executing. Invoke igt_resume if you want to execute.\n"); @@ -2438,6 +2442,10 @@ bool execute(struct execute_state *state, int result; bool already_written = false; + /* Calls before running each test */ + igt_facts(last_test); + last_test = entry_display_name(&job_list->entries[state->next]); + if (should_die_because_signal(sigfd)) { status = false; goto end; @@ -2526,6 +2534,8 @@ bool execute(struct execute_state *state, return execute(state, settings, job_list); } } + /* Last call to collect facts after the last test runs */ + igt_facts(last_test); if ((timefd = openat(resdirfd, "endtime.txt", O_CREAT | O_WRONLY | O_EXCL, 0666)) >= 0) { dprintf(timefd, "%f\n", timeofday_double()); diff --git a/tools/lsfacts.c b/tools/lsfacts.c new file mode 100644 index 000000000..10dee0317 --- /dev/null +++ b/tools/lsfacts.c @@ -0,0 +1,25 @@ +// SPDX-License-Identifier: MIT +// Copyright © 2024 Intel Corporation + +#include "igt.h" +#include "igt_facts.h" + +/** + * SECTION:lsfacts + * @short_description: lsfacts + * @title: lsfacts + * @include: lsfacts.c + * + * # lsfacts + * + * Scan for igt-facts and print them on screen. Indicate if no facts are found. + */ +int main(int argc, char *argv[]) +{ + igt_facts_lists_init(); + + igt_facts("lsfacts"); + + if (igt_facts_are_all_lists_empty()) + igt_info("No facts found...\n"); +} diff --git a/tools/meson.build b/tools/meson.build index 48c9a4b50..ff1b0ef90 100644 --- a/tools/meson.build +++ b/tools/meson.build @@ -42,6 +42,7 @@ tools_progs = [ 'intel_gem_info', 'intel_gvtg_test', 'dpcd_reg', + 'lsfacts', 'lsgpu', 'power', ] -- 2.34.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* RE: [PATCH i-g-t v10] igt-runner fact checking 2024-12-05 4:54 ` [PATCH i-g-t v10] " Peter Senna Tschudin @ 2024-12-05 9:08 ` Piatkowski, Dominik Karol 2024-12-06 11:42 ` Kamil Konieczny 1 sibling, 0 replies; 13+ messages in thread From: Piatkowski, Dominik Karol @ 2024-12-05 9:08 UTC (permalink / raw) To: Peter Senna Tschudin, igt-dev@lists.freedesktop.org Cc: Knop, Ryszard, Janusz Krzysztofik, Kempczynski, Zbigniew, De Marchi, Lucas, Coelho, Luciano, Das, Nirmoy, Summers, Stuart, Ghimiray, Himal Prasad, Piecielska, Katarzyna Hi Peter, > -----Original Message----- > From: Peter Senna Tschudin <peter.senna@linux.intel.com> > Sent: Thursday, December 5, 2024 5:54 AM > To: igt-dev@lists.freedesktop.org > Cc: Knop, Ryszard <ryszard.knop@intel.com>; Janusz Krzysztofik > <janusz.krzysztofik@linux.intel.com>; Kempczynski, Zbigniew > <zbigniew.kempczynski@intel.com>; De Marchi, Lucas > <lucas.demarchi@intel.com>; Coelho, Luciano <luciano.coelho@intel.com>; > Das, Nirmoy <nirmoy.das@intel.com>; Summers, Stuart > <stuart.summers@intel.com>; Ghimiray, Himal Prasad > <himal.prasad.ghimiray@intel.com>; Piatkowski, Dominik Karol > <dominik.karol.piatkowski@intel.com>; Piecielska, Katarzyna > <katarzyna.piecielska@intel.com> > Subject: [PATCH i-g-t v10] igt-runner fact checking > > When using igt-runner, collect facts before each test and after the last test, and > report when facts change. The facts are: > - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel > Battlemage (Gen20) > - Associations between PCI GPU and DRM card: > hardware.pci.drm_card_at_addr.0000:03:00.0: card1 > - Kernel taints: kernel.is_tainted.taint_warn: true > - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true > > This change imposes little execution overhead and adds just a few lines of > logging. The facts will be printed on normal igt-runner output. Here is a real > example from our CI showing hotreplug-lateclose changing the DRM card > number and tainting the kernel on the abort path: > > [245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose) > [245.383596] Starting subtest: hotreplug-lateclose [249.843361] Aborting: > Lockdep not active [249.858249] [FACT core_hotunplug (hotreplug- > lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> > card1 [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: > kernel.is_tainted.taint_die: true [249.859075] Closing watchdogs > > CC: Ryszard Knop <ryszard.knop@intel.com> > CC: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> > CC: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> > CC: Lucas De Marchi <lucas.demarchi@intel.com> > CC: luciano.coelho@intel.com > CC: nirmoy.das@intel.com > CC: stuart.summers@intel.com > CC: himal.prasad.ghimiray@intel.com > CC: dominik.karol.piatkowski@intel.com > CC: katarzyna.piecielska@intel.com > Signed-off-by: Peter Senna Tschudin <peter.senna@linux.intel.com> > --- > Re-sending as the mailing list server rejected the message yesterday due to a > probably full disk... > > v10: > - fix memory leaks from asprintf (Thank you Dominik Karol!) > - fix comments for consistency (Thank you Dominik Karol!) > > v9: > - do not report new hardware when loading/unloading kmod changes > the string of the GPU name. I accidentally reintroduced this > issue when refactoring to use linked lists. > - add tools/lsfacts: 9 lines of code that print either the facts > or that no facts were found. > - fix code comments describing functions > - fix white space issues > > v8: > - fix white space issues > > v7: > - refactor to use linked lists provided by igt_lists > - Added function arguments to code comments > - updated commit message > > v6: > - sort includes in igt_facts.c alphabetically > - add facts for kernel taints using igt_kernel_tainted() and > igt_explain_taints() > > v5: > - fix the broken patch format from v4 > > v4: > - fix a bug on delete_fact() > - drop glib and calls to g_ functions > - change commit message to indicate that report only on fact changes > - use consistent format for reporting changes > - fix SPDX header format > > v3: > - refreshed commit message > - changed format SPDX string > - removed license text > - replace last_test assignment when null by two ternary operators > - added function descriptions following example found elsewhere in > the code > - added igt_assert to catch failures to realloc() > > v2: > - add lib/tests/igt_facts.c for basic unit testing > - bugfix: do not report a new gpu when the driver changes the gpu name > - bugfix: do not report the pci_id twice on the gpu name > > lib/igt_facts.c | 755 > ++++++++++++++++++++++++++++++++++++++++++ > lib/igt_facts.h | 47 +++ > lib/meson.build | 1 + > lib/tests/igt_facts.c | 15 + > lib/tests/meson.build | 1 + > runner/executor.c | 10 + > tools/lsfacts.c | 25 ++ > tools/meson.build | 1 + > 8 files changed, 855 insertions(+) > create mode 100644 lib/igt_facts.c > create mode 100644 lib/igt_facts.h > create mode 100644 lib/tests/igt_facts.c create mode 100644 tools/lsfacts.c > > diff --git a/lib/igt_facts.c b/lib/igt_facts.c new file mode 100644 index > 000000000..4749d3417 > --- /dev/null > +++ b/lib/igt_facts.c > @@ -0,0 +1,755 @@ > +// SPDX-License-Identifier: MIT > +// Copyright C 2024 Intel Corporation > + > +#include <ctype.h> > +#include <libudev.h> > +#include <stdio.h> > +#include <sys/time.h> > +#include <time.h> > + > +#include "igt_core.h" > +#include "igt_device_scan.h" > +#include "igt_facts.h" > +#include "igt_kmod.h" > +#include "igt_list.h" > +#include "igt_taints.h" > + > +static struct igt_list_head igt_facts_list_drm_card_head; static struct > +igt_list_head igt_facts_list_kmod_head; static struct igt_list_head > +igt_facts_list_ktaint_head; static struct igt_list_head > +igt_facts_list_pci_gpu_head; > + > + > +/** > + * igt_facts_lists_init: > + * > + * Initialize igt_facts linked lists. > + * > + * Returns: void > + */ > +void igt_facts_lists_init(void) > +{ > + IGT_INIT_LIST_HEAD(&igt_facts_list_drm_card_head); > + IGT_INIT_LIST_HEAD(&igt_facts_list_kmod_head); > + IGT_INIT_LIST_HEAD(&igt_facts_list_ktaint_head); > + IGT_INIT_LIST_HEAD(&igt_facts_list_pci_gpu_head); > +} > + > + > +/** > + * igt_facts_log: > + * @last_test: name of the test that triggered the fact > + * @name: name of the fact > + * @new_value: new value of the fact > + * @old_value: old value of the fact > + * > + * Reports fact changes: > + * - new fact: if old_value is NULL and new_value is not NULL > + * - deleted fact: if new_value is NULL and old_value is not NULL > + * - changed fact: if new_value is different from old_value > + * > + * Returns: void > + */ > +static void igt_facts_log(const char *last_test, const char *name, > + const char *new_value, const char *old_value) { > + struct timespec uptime_ts; > + char *uptime = NULL; > + const char *before_tests = "before any test"; > + > + if (old_value == NULL && new_value == NULL) > + return; > + > + if (clock_gettime(CLOCK_BOOTTIME, &uptime_ts) != 0) > + return; > + > + asprintf(&uptime, > + "%ld.%06ld", > + uptime_ts.tv_sec, > + uptime_ts.tv_nsec / 1000); > + > + /* New fact */ > + if (old_value == NULL && new_value != NULL) { > + igt_info("[%s] [FACT %s] new: %s: %s\n", > + uptime, > + last_test ? last_test : before_tests, > + name, > + new_value); > + goto out; > + } > + > + /* Update fact */ > + if (old_value != NULL && new_value != NULL) { > + igt_info("[%s] [FACT %s] changed: %s: %s -> %s\n", > + uptime, > + last_test ? last_test : before_tests, > + name, > + old_value, > + new_value); > + goto out; > + } > + > + /* Deleted fact */ > + if (old_value != NULL && new_value == NULL) { > + igt_info("[%s] [FACT %s] deleted: %s: %s\n", > + uptime, > + last_test ? last_test : before_tests, > + name, > + old_value); > + goto out; > + } > + > +out: > + free(uptime); > +} > + > +/** > + * igt_facts_list_get: > + * @name: name of the fact to be added > + * @head: head of the list > + * > + * Get a fact from the list. > + * > + * Returns: pointer to the fact if found, NULL otherwise > + * > + */ > +static igt_fact *igt_facts_list_get(const char *name, > + struct igt_list_head *head) > +{ > + igt_fact *fact = NULL; > + > + if (igt_list_empty(head)) > + return NULL; > + > + igt_list_for_each_entry(fact, head, link) { > + if (strcmp(fact->name, name) == 0) > + return fact; > + } > + return NULL; > +} > + > +/** > + * igt_facts_list_del: > + * @name: name of the fact to be added > + * @head: head of the list > + * @last_test: name of the last test > + * @log: bool indicating if the delete operation should be logged > + * > + * Delete a fact from the list. > + * > + * Returns: bool indicating if fact was deleted from the list > + * > + */ > +static bool igt_facts_list_del(const char *name, > + struct igt_list_head *head, > + const char *last_test, > + bool log) > +{ > + igt_fact *fact = NULL; > + > + if (igt_list_empty(head)) > + return false; > + > + igt_list_for_each_entry(fact, head, link) { > + if (strcmp(fact->name, name) == 0) { > + if (log) > + igt_facts_log(last_test, fact->name, > + NULL, fact->value); > + > + igt_list_del(&fact->link); > + free(fact->name); > + free(fact->value); > + free(fact->last_test); > + free(fact); > + return true; > + } > + } > + return false; > +} > + > +/** > + * igt_facts_list_add: > + * @name: name of the fact to be added > + * @value: value of the fact to be added > + * @last_test: name of the last test > + * @head: head of the list > + * > + * Returns: bool indicating if fact was added to the list > + * > + */ > +static bool igt_facts_list_add(const char *name, > + const char *value, > + const char *last_test, > + struct igt_list_head *head) > +{ > + igt_fact *new_fact = NULL, *old_fact = NULL; > + bool logged = false; > + > + if (name == NULL || value == NULL) > + return false; > + > + old_fact = igt_facts_list_get(name, head); > + if (old_fact) { > + if (strcmp(old_fact->value, value) == 0) { > + old_fact->present = true; > + return false; > + } > + igt_facts_log(last_test, name, value, old_fact->value); > + logged = true; > + igt_facts_list_del(name, head, last_test, false); > + } > + > + new_fact = malloc(sizeof(igt_fact)); > + if (new_fact == NULL) > + return false; > + > + new_fact->name = strdup(name); > + new_fact->value = strdup(value); > + new_fact->last_test = last_test ? strdup(last_test) : NULL; > + new_fact->present = true; > + > + if (!logged) > + igt_facts_log(last_test, name, value, NULL); > + > + igt_list_add(&new_fact->link, head); > + > + return true; > +} > + > +/** > + * igt_facts_list_mark: > + * @head: head of the list > + * > + * Mark all facts in the list as not present. Opted for the mark and > +sweep > + * design pattern due to its simplicity and efficiency. > + * > + * Returns: void > + */ > +static void igt_facts_list_mark(struct igt_list_head *head) { > + igt_fact *fact = NULL; > + > + if (igt_list_empty(head)) > + return; > + > + igt_list_for_each_entry(fact, head, link) > + fact->present = false; > +} > + > +/** > + * igt_facts_list_sweep: > + * @head: head of the list > + * @last_test: name of the last test > + * > + * Sweep the list and delete all facts that are not present. Opted for > +the mark > + * and sweep design pattern due to its simplicity and efficiency. > + * > + * Returns: void > + */ > +static void igt_facts_list_sweep(struct igt_list_head *head, > + const char *last_test) > +{ > + igt_fact *fact = NULL, *tmp = NULL; > + > + if (igt_list_empty(head)) > + return; > + > + igt_list_for_each_entry_safe(fact, tmp, head, link) > + if (!fact->present) > + igt_facts_list_del(fact->name, head, last_test, true); } > + > +/** > + * igt_facts_list_mark_and_sweep: > + * @head: head of the list > + * > + * Clean up the list using mark and sweep. Opted for the mark and sweep > + * design pattern due to its simplicity and efficiency. > + * > + * Returns: void > + */ > +static void igt_facts_list_mark_and_sweep(struct igt_list_head *head) { > + igt_facts_list_mark(head); > + igt_facts_list_sweep(head, NULL); > +} > + > +/** > + * igt_facts_are_all_lists_empty: > + * > + * Returns true if all lists are empty. Used by the tool lsfacts. > + * > + * Returns: bool > + */ > +bool igt_facts_are_all_lists_empty(void) > +{ > + return igt_list_empty(&igt_facts_list_drm_card_head) && > + igt_list_empty(&igt_facts_list_kmod_head) && > + igt_list_empty(&igt_facts_list_ktaint_head) && > + igt_list_empty(&igt_facts_list_pci_gpu_head); > +} > + > +/** > + * igt_facts_scan_pci_gpus: > + * @last_test: name of the last test > + * > + * This function scans the pci bus for gpus using udev. It uses > + * igt_facts_list_mark(), igt_facts_list_add() and > +igt_facts_list_sweep() to > + * update igt_facts_list_pci_gpu_head. > + * > + * Returns: void > + */ > +static void igt_facts_scan_pci_gpus(const char *last_test) { > + static struct igt_list_head *head = &igt_facts_list_pci_gpu_head; > + struct udev *udev = NULL; > + struct udev_enumerate *enumerate = NULL; > + struct udev_list_entry *devices, *dev_list_entry; > + struct igt_device_card card; > + char pcistr[10]; > + int ret; > + char *factname = NULL; > + char *factvalue = NULL; > + > + udev = udev_new(); > + if (!udev) { > + igt_warn("Failed to create udev context\n"); > + return; > + } > + > + enumerate = udev_enumerate_new(udev); > + if (!enumerate) { > + igt_warn("Failed to create udev enumerate\n"); > + udev_unref(udev); > + return; > + } > + > + ret = udev_enumerate_add_match_subsystem(enumerate, "pci"); > + if (ret < 0) > + goto out; > + > + ret = udev_enumerate_add_match_property(enumerate, > + "PCI_CLASS", > + "30000"); > + if (ret < 0) > + goto out; > + > + ret = udev_enumerate_add_match_property(enumerate, > + "PCI_CLASS", > + "38000"); > + if (ret < 0) > + goto out; > + > + ret = udev_enumerate_scan_devices(enumerate); > + if (ret < 0) > + goto out; > + > + devices = udev_enumerate_get_list_entry(enumerate); > + if (!devices) > + goto out; > + > + igt_facts_list_mark(head); > + > + udev_list_entry_foreach(dev_list_entry, devices) { > + const char *path; > + struct udev_device *udev_dev; > + struct udev_list_entry *entry; > + char *model = NULL; > + char *codename = NULL; > + igt_fact *old_fact = NULL; > + > + path = udev_list_entry_get_name(dev_list_entry); > + udev_dev = udev_device_new_from_syspath(udev, path); > + if (!udev_dev) > + continue; > + > + /* Strip path to only the content after the last / */ > + path = strrchr(path, '/'); > + if (path) > + path++; > + else > + path = "unknown"; > + > + strcpy(card.pci_slot_name, "-"); > + > + entry = udev_device_get_properties_list_entry(udev_dev); > + while (entry) { > + const char *name = udev_list_entry_get_name(entry); > + const char *value = udev_list_entry_get_value(entry); > + > + entry = udev_list_entry_get_next(entry); > + if (!strcmp(name, "ID_MODEL_FROM_DATABASE")) > + model = strdup(value); > + else if (!strcmp(name, "PCI_ID")) > + igt_assert_eq(sscanf(value, "%hx:%hx", > + &card.pci_vendor, > + &card.pci_device), 2); > + } > + snprintf(pcistr, sizeof(pcistr), "%04x:%04x", > + card.pci_vendor, card.pci_device); > + codename = igt_device_get_pretty_name(&card, false); > + > + /* Set codename to null if it is the same string as pci_id */ > + if (codename && strcmp(pcistr, codename) == 0) { > + free(codename); > + codename = NULL; > + } > + asprintf(&factname, "%s.%s", pci_gpu_fact, path); > + asprintf(&factvalue, > + "%s %s %s", > + pcistr, > + codename ? codename : "", > + model ? model : ""); > + > + /** > + * Loading and unloading the kmod may change the human > + * readeable string in value. Do not change value if the > + * pci id is the same. > + */ > + old_fact = igt_facts_list_get(factname, head); > + if (old_fact && strncmp(old_fact->value, factvalue, 9) == 0) > + old_fact->present = true; > + else > + igt_facts_list_add(factname, factvalue, last_test, > head); > + > + free(codename); > + free(model); > + free(factname); > + free(factvalue); > + udev_device_unref(udev_dev); > + } > + > + igt_facts_list_sweep(head, last_test); > + > +out: > + udev_enumerate_unref(enumerate); > + udev_unref(udev); > +} > + > +/** > + * igt_facts_scan_pci_drm_cards: > + * @last_test: name of the last test > + * > + * This function scans the pci bus for drm cards using udev. It uses > +the > + * igt_facts_list_mark(), igt_facts_list_add() and > +igt_facts_list_sweep() to > + * update igt_facts_list_drm_card_head. > + * > + * Returns: void > + */ > +static void igt_facts_scan_pci_drm_cards(const char *last_test) { > + static struct igt_list_head *head = &igt_facts_list_drm_card_head; > + struct udev *udev = NULL; > + struct udev_enumerate *enumerate = NULL; > + struct udev_list_entry *devices, *dev_list_entry; > + int ret; > + char *factname = NULL; > + char *factvalue = NULL; > + > + udev = udev_new(); > + if (!udev) > + return; > + > + enumerate = udev_enumerate_new(udev); > + if (!enumerate) { > + udev_unref(udev); > + return; > + } > + > + ret = udev_enumerate_add_match_subsystem(enumerate, "drm"); > + if (ret < 0) > + goto out; > + > + ret = udev_enumerate_scan_devices(enumerate); > + if (ret < 0) > + goto out; > + > + devices = udev_enumerate_get_list_entry(enumerate); > + if (!devices) > + goto out; > + > + ret = udev_enumerate_add_match_subsystem(enumerate, "drm"); > + if (ret < 0) > + goto out; > + > + ret = udev_enumerate_scan_devices(enumerate); > + if (ret < 0) > + goto out; > + > + devices = udev_enumerate_get_list_entry(enumerate); > + if (!devices) > + goto out; > + > + igt_facts_list_mark(head); > + > + udev_list_entry_foreach(dev_list_entry, devices) { > + const char *path; > + struct udev_device *drm_dev, *pci_dev; > + const char *drm_name, *pci_addr; > + > + path = udev_list_entry_get_name(dev_list_entry); > + drm_dev = udev_device_new_from_syspath(udev, path); > + if (!drm_dev) > + continue; > + > + drm_name = udev_device_get_sysname(drm_dev); > + /* Filter the device by name. Want devices such as card0 and > card1. > + * If the device has '-' in the name, contine > + */ > + if (strncmp(drm_name, "card", 4) != 0 || > + strchr(drm_name, '-') != NULL) { > + udev_device_unref(drm_dev); > + continue; > + } > + > + /* Get the pci address of the gpu associated with the > drm_dev*/ > + pci_dev = > udev_device_get_parent_with_subsystem_devtype(drm_dev, > + "pci", > + > NULL); > + if (pci_dev) { > + pci_addr = udev_device_get_sysattr_value(pci_dev, > + "address"); > + if (!pci_addr) > + pci_addr = > udev_device_get_sysname(pci_dev); > + } else { > + /* Some GPUs are platform devices. Ignore them. */ > + pci_addr = NULL; > + udev_device_unref(drm_dev); > + continue; > + } > + > + asprintf(&factname, "%s.%s", drm_card_fact, pci_addr); > + asprintf(&factvalue, "%s", drm_name); > + > + igt_facts_list_add(factname, factvalue, last_test, head); > + > + free(factname); > + free(factvalue); > + udev_device_unref(drm_dev); > + } > + > + igt_facts_list_sweep(head, last_test); > + > +out: > + udev_enumerate_unref(enumerate); > + udev_unref(udev); > +} > + > +/** > + * igt_facts_scan_kernel_taints: > + * @last_test: name of the last test > + * > + * This function scans for kernel taints using igt_kernel_tainted() and > + * igt_explain_taints(). It will cut off the explanation keeping only > +the > + * taint name. > + * > + * Returns: void > + */ > +static void igt_facts_scan_kernel_taints(const char *last_test) { > + static struct igt_list_head *head = &igt_facts_list_ktaint_head; > + unsigned long taints = 0; > + const char *reason = NULL; > + char *taint_name = NULL; > + char *fact_name = NULL; > + > + taints = igt_kernel_tainted(&taints); > + /* For testing, set all bits to 1 > + * taints = 0xFFFFFFFF; > + */ > + > + > + igt_facts_list_mark(head); > + > + while ((reason = igt_explain_taints(&taints)) != NULL) { > + /* Cut at the ':' to get only the taint name */ > + taint_name = strtok(strdup(reason), ":"); > + if (!taint_name) > + continue; > + > + /* Lowercase taint_name */ > + for (int i = 0; taint_name[i]; i++) > + taint_name[i] = tolower(taint_name[i]); > + > + asprintf(&fact_name, "%s.%s", ktaint_fact, taint_name); > + igt_facts_list_add(fact_name, "true", last_test, head); > + > + free(taint_name); > + free(fact_name); > + } > + > + igt_facts_list_sweep(head, last_test); } > + > + > +/** > + * igt_facts_scan_kernel_loaded_kmods: > + * @last_test: name of the last test > + * > + * This function scans for loaded kmods using igt_fact_kmod_list and > + * igt_kmod_is_loaded(). > + * > + * Returns: void > + */ > +static void igt_facts_scan_kernel_loaded_kmods(const char *last_test) { > + static struct igt_list_head *head = &igt_facts_list_kmod_head; > + char *name = NULL; > + > + igt_facts_list_mark(head); > + > + /* Iterate over igt_fact_kmod_list[] until the element contains "\0" */ > + for (int i = 0; strcmp(igt_fact_kmod_list[i], "\0") != 0; i++) { > + asprintf(&name, "%s.%s", kmod_fact, igt_fact_kmod_list[i]); > + if (igt_kmod_is_loaded(igt_fact_kmod_list[i])) > + igt_facts_list_add(name, "true", last_test, head); > + > + free(name); > + } > + > + igt_facts_list_sweep(head, last_test); } > + > +/** > + * igt_facts: > + * @last_test: name of the last test > + * > + * Call this function where you want to gather and report facts. > + * > + * Returns: void > + */ > +void igt_facts(const char *last_test) > +{ > + igt_facts_scan_pci_gpus(last_test); > + igt_facts_scan_pci_drm_cards(last_test); > + igt_facts_scan_kernel_taints(last_test); > + igt_facts_scan_kernel_loaded_kmods(last_test); > + > + fflush(stdout); > + fflush(stderr); > +} > + > +/* > + * Testing > + * > + * Defined here to keep most of the functions static > + * > + */ > + > +/** > + * igt_facts_test_add_get: > + * @head: head of the list > + * > + * Tests igt_facts_list_add and igt_facts_list_get. > + * > + * Returns: void > + */ > +static void igt_facts_test_add_get(struct igt_list_head *head) { > + igt_fact *fact = NULL; > + bool ret; > + const char *name = "hardware.pci.gpu_at_addr.0000:00:02.0"; > + const char *value = "8086:64a0 Intel Lunarlake (Gen20)"; > + const char *last_test = NULL; > + > + ret = igt_facts_list_add(name, value, last_test, head); > + igt_assert(ret == true); > + > + // Assert that there is one element in the linked list > + igt_assert_eq(igt_list_length(head), 1); > + > + // Assert that the element in the linked list is the one we added > + fact = igt_facts_list_get(name, head); > + igt_assert(fact != NULL); > + igt_assert_eq(strcmp(fact->name, name), 0); > + igt_assert_eq(strcmp(fact->value, value), 0); > + igt_assert(fact->present == true); > + igt_assert(fact->last_test == NULL); > +} > + > +/** > + * igt_facts_test_mark_and_sweep: > + * @head: head of the list > + * > + * - Add 3 elements to the list and mark them as not present. > + * - Update two of the elements and mark them as present. > + * - Sweep the list and assert that > + * - Only the two updated elements are present > + * - The third element was deleted > + * > + * Returns: void > + */ > +static void igt_facts_test_mark_and_sweep(struct igt_list_head *head) { > + igt_fact *fact = NULL; > + const char *name1 = "hardware.pci.gpu_at_addr.0000:00:02.0"; > + const char *value1 = "8086:64a0 Intel Lunarlake (Gen20)"; > + const char *name2 = "hardware.pci.gpu_at_addr.0000:00:03.0"; > + const char *value2 = "8086:64a1 Intel Lunarlake (Gen21)"; > + const char *name3 = "hardware.pci.gpu_at_addr.0000:00:04.0"; > + const char *value3 = "8086:64a2 Intel Lunarlake (Gen22)"; > + > + igt_facts_list_add(name1, value1, NULL, head); > + igt_facts_list_add(name2, value2, NULL, head); > + igt_facts_list_add(name3, value3, NULL, head); > + > + igt_facts_list_mark(head); > + > + igt_facts_list_add(name1, value1, NULL, head); > + igt_facts_list_add(name2, value2, NULL, head); > + > + igt_facts_list_sweep(head, NULL); > + > + // Assert that there are two elements in the linked list > + igt_assert_eq(igt_list_length(head), 2); > + > + // Assert that the two updated elements are present > + fact = igt_facts_list_get(name1, head); > + igt_assert(fact != NULL); > + igt_assert(fact->present == true); > + > + fact = igt_facts_list_get(name2, head); > + igt_assert(fact != NULL); > + igt_assert(fact->present == true); > + > + // Assert that the third element was deleted > + fact = igt_facts_list_get(name3, head); > + igt_assert(fact == NULL); > +} > + > +/** > + * igt_facts_test: > + * > + * Main function for testing the igt_facts module > + * > + * Returns: bool indicating if the tests passed */ void > +igt_facts_test(void) { > + const char *last_test = "Unit Testing"; > + > + igt_facts_lists_init(); > + > + /* Assert that all lists are empty */ > + igt_assert(igt_list_empty(&igt_facts_list_kmod_head)); > + igt_assert(igt_list_empty(&igt_facts_list_ktaint_head)); > + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head)); > + igt_assert(igt_list_empty(&igt_facts_list_drm_card_head)); > + > + /* Assert that add and get work. Will add one element to the list */ > + igt_facts_test_add_get(&igt_facts_list_pci_gpu_head); > + > + /* Assert that igt_facts_list_mark_and_sweep() cleans up the list */ > + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == false); > + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head); > + igt_assert(igt_list_empty(&igt_facts_list_pci_gpu_head) == true); > + > + /* Test the mark and sweep pattern used to delete elements > + * from the list > + */ > + igt_facts_test_mark_and_sweep(&igt_facts_list_pci_gpu_head); > + > + /* Clean up the list and call igt_facts(). This should not crash */ > + igt_facts_list_mark_and_sweep(&igt_facts_list_pci_gpu_head); > + igt_facts(last_test); > +} > diff --git a/lib/igt_facts.h b/lib/igt_facts.h new file mode 100644 index > 000000000..e4adca3fb > --- /dev/null > +++ b/lib/igt_facts.h > @@ -0,0 +1,47 @@ > +/* SPDX-License-Identifier: MIT > + * Copyright C 2024 Intel Corporation > + */ > + > +#include <stdbool.h> > + > +#include "igt_list.h" > + > + > +/* igt_fact: > + * @name: name of the fact > + * @value: value of the fact > + * @last_test: name of the test that triggered the fact > + * @present: bool indicating if fact is present. Used for deleting > +facts from > + * the list. > + * @link: link to the next fact > + * > + * A fact is a piece of information that can be used to determine the > +state of > + * the system. > + * > + */ > +typedef struct { > + char *name; > + char *value; > + char *last_test; > + bool present; /* For mark and seep */ Typo: seep -> sweep With this fixed, Reviewed-by: Dominik Karol Piątkowski <dominik.karol.piatkowski@intel.com> > + struct igt_list_head link; > +} igt_fact; > + > +const char *igt_fact_kmod_list[] = { > + "amdgpu", > + "i915", > + "nouveau", > + "radeon", > + "xe", > + "\0" > +}; > + > +const char *kmod_fact = "kernel.kmod_is_loaded"; /* true or false */ > +const char *ktaint_fact = "kernel.is_tainted"; /* taint name: taint_warn */ > +const char *pci_gpu_fact = "hardware.pci.gpu_at_addr"; /* id vendor > +model */ const char *drm_card_fact = "hardware.pci.drm_card_at_addr"; > +/* cardX */ > + > +void igt_facts_lists_init(void); > +void igt_facts(const char *last_test); > +bool igt_facts_are_all_lists_empty(void); > +void igt_facts_test(void); /* For unit testing only */ > diff --git a/lib/meson.build b/lib/meson.build index c3556a921..c44ca2b5a > 100644 > --- a/lib/meson.build > +++ b/lib/meson.build > @@ -18,6 +18,7 @@ lib_sources = [ > 'i915/i915_crc.c', > 'igt_collection.c', > 'igt_color_encoding.c', > + 'igt_facts.c', > 'igt_crc.c', > 'igt_debugfs.c', > 'igt_device.c', > diff --git a/lib/tests/igt_facts.c b/lib/tests/igt_facts.c new file mode 100644 > index 000000000..7fa9d0f22 > --- /dev/null > +++ b/lib/tests/igt_facts.c > @@ -0,0 +1,15 @@ > +// SPDX-License-Identifier: MIT > +// Copyright C 2024 Intel Corporation > + > +#include <stdbool.h> > + > +#include "igt_core.h" > +#include "igt_facts.h" > + > +/* Tests are not defined here so we can keep most of the functions > +static */ > + > +igt_simple_main > +{ > + igt_info("Running igt_facts_test\n"); > + igt_facts_test(); > +} > diff --git a/lib/tests/meson.build b/lib/tests/meson.build index > df8092638..1ce19f63c 100644 > --- a/lib/tests/meson.build > +++ b/lib/tests/meson.build > @@ -8,6 +8,7 @@ lib_tests = [ > 'igt_dynamic_subtests', > 'igt_edid', > 'igt_exit_handler', > + 'igt_facts', > 'igt_fork', > 'igt_fork_helper', > 'igt_hook', > diff --git a/runner/executor.c b/runner/executor.c index > ac73e1dde..d1eca3c05 100644 > --- a/runner/executor.c > +++ b/runner/executor.c > @@ -30,6 +30,7 @@ > > #include "igt_aux.h" > #include "igt_core.h" > +#include "igt_facts.h" > #include "igt_taints.h" > #include "igt_vec.h" > #include "executor.h" > @@ -2306,6 +2307,9 @@ bool execute(struct execute_state *state, > sigset_t sigmask; > double time_spent = 0.0; > bool status = true; > + char *last_test = NULL; > + > + igt_facts_lists_init(); > > if (state->dry) { > outf("Dry run, not executing. Invoke igt_resume if you want to > execute.\n"); @@ -2438,6 +2442,10 @@ bool execute(struct execute_state > *state, > int result; > bool already_written = false; > > + /* Calls before running each test */ > + igt_facts(last_test); > + last_test = entry_display_name(&job_list->entries[state- > >next]); > + > if (should_die_because_signal(sigfd)) { > status = false; > goto end; > @@ -2526,6 +2534,8 @@ bool execute(struct execute_state *state, > return execute(state, settings, job_list); > } > } > + /* Last call to collect facts after the last test runs */ > + igt_facts(last_test); > > if ((timefd = openat(resdirfd, "endtime.txt", O_CREAT | O_WRONLY | > O_EXCL, 0666)) >= 0) { > dprintf(timefd, "%f\n", timeofday_double()); diff --git > a/tools/lsfacts.c b/tools/lsfacts.c new file mode 100644 index > 000000000..10dee0317 > --- /dev/null > +++ b/tools/lsfacts.c > @@ -0,0 +1,25 @@ > +// SPDX-License-Identifier: MIT > +// Copyright C 2024 Intel Corporation > + > +#include "igt.h" > +#include "igt_facts.h" > + > +/** > + * SECTION:lsfacts > + * @short_description: lsfacts > + * @title: lsfacts > + * @include: lsfacts.c > + * > + * # lsfacts > + * > + * Scan for igt-facts and print them on screen. Indicate if no facts are found. > + */ > +int main(int argc, char *argv[]) > +{ > + igt_facts_lists_init(); > + > + igt_facts("lsfacts"); > + > + if (igt_facts_are_all_lists_empty()) > + igt_info("No facts found...\n"); > +} > diff --git a/tools/meson.build b/tools/meson.build index > 48c9a4b50..ff1b0ef90 100644 > --- a/tools/meson.build > +++ b/tools/meson.build > @@ -42,6 +42,7 @@ tools_progs = [ > 'intel_gem_info', > 'intel_gvtg_test', > 'dpcd_reg', > + 'lsfacts', > 'lsgpu', > 'power', > ] > -- > 2.34.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking 2024-12-05 4:54 ` [PATCH i-g-t v10] " Peter Senna Tschudin 2024-12-05 9:08 ` Piatkowski, Dominik Karol @ 2024-12-06 11:42 ` Kamil Konieczny 2024-12-06 13:16 ` Peter Senna Tschudin 1 sibling, 1 reply; 13+ messages in thread From: Kamil Konieczny @ 2024-12-06 11:42 UTC (permalink / raw) To: Peter Senna Tschudin Cc: igt-dev@lists.freedesktop.org, Ryszard Knop, Janusz Krzysztofik, Zbigniew Kempczyński, Lucas De Marchi, luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray, dominik.karol.piatkowski, katarzyna.piecielska, Petri Latvala, Juha-Pekka Heikkila, Juha-Pekka Heikkila, Swati Sharma, Jani Saarinen, Jani Nikula, Rob Clark, Rob Clark, Helen Koike, Melissa Wen, Maíra Canal Hi Peter, On 2024-12-05 at 05:54:26 +0100, Peter Senna Tschudin wrote: overall looks good, I have few nits, see below. As for subject, it usally informs what changed, so "igt-runner fact checking" is a little misleading. Please see my ask about that below. > When using igt-runner, collect facts before each test and after the > last test, and report when facts change. The facts are: > - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20) > - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1 > - Kernel taints: kernel.is_tainted.taint_warn: true > - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true > > This change imposes little execution overhead and adds just a few imho 'little overhead' is beause facts are only four but I agree we could change implementation when its number grows. > lines of logging. The facts will be printed on normal igt-runner > output. Here is a real example from our CI showing > hotreplug-lateclose changing the DRM card number and tainting the > kernel on the abort path: > > [245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose) > [245.383596] Starting subtest: hotreplug-lateclose > [249.843361] Aborting: Lockdep not active > [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1 > [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true > [249.859075] Closing watchdogs > > CC: Ryszard Knop <ryszard.knop@intel.com> > CC: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> > CC: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> > CC: Lucas De Marchi <lucas.demarchi@intel.com> > CC: luciano.coelho@intel.com > CC: nirmoy.das@intel.com > CC: stuart.summers@intel.com > CC: himal.prasad.ghimiray@intel.com > CC: dominik.karol.piatkowski@intel.com > CC: katarzyna.piecielska@intel.com +cc Petri + kms devs Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Cc: Juha-Pekka Heikkila <juha-pekka.heikkila@intel.com> Cc: Swati Sharma <swati2.sharma@intel.com> Cc: Jani Saarinen <jani.saarinen@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> +other gpu devs Cc: Rob Clark <robdclark@gmail.com> Cc: Rob Clark <robdclark@chromium.org> Cc: Helen Koike <helen.koike@collabora.com> +drm ci dev Cc: Melissa Wen <mwen@igalia.com> Cc: Maíra Canal <mcanal@igalia.com> > Signed-off-by: Peter Senna Tschudin <peter.senna@linux.intel.com> > --- > Re-sending as the mailing list server rejected the message > yesterday due to a probably full disk... > > v10: > - fix memory leaks from asprintf (Thank you Dominik Karol!) > - fix comments for consistency (Thank you Dominik Karol!) > > v9: > - do not report new hardware when loading/unloading kmod changes > the string of the GPU name. I accidentally reintroduced this > issue when refactoring to use linked lists. > - add tools/lsfacts: 9 lines of code that print either the facts > or that no facts were found. > - fix code comments describing functions > - fix white space issues > > v8: > - fix white space issues > > v7: > - refactor to use linked lists provided by igt_lists > - Added function arguments to code comments > - updated commit message > > v6: > - sort includes in igt_facts.c alphabetically > - add facts for kernel taints using igt_kernel_tainted() and > igt_explain_taints() > > v5: > - fix the broken patch format from v4 > > v4: > - fix a bug on delete_fact() > - drop glib and calls to g_ functions > - change commit message to indicate that report only on fact changes > - use consistent format for reporting changes > - fix SPDX header format > > v3: > - refreshed commit message > - changed format SPDX string > - removed license text > - replace last_test assignment when null by two ternary operators > - added function descriptions following example found elsewhere in > the code > - added igt_assert to catch failures to realloc() > > v2: > - add lib/tests/igt_facts.c for basic unit testing > - bugfix: do not report a new gpu when the driver changes the gpu name > - bugfix: do not report the pci_id twice on the gpu name > > lib/igt_facts.c | 755 ++++++++++++++++++++++++++++++++++++++++++ > lib/igt_facts.h | 47 +++ > lib/meson.build | 1 + > lib/tests/igt_facts.c | 15 + > lib/tests/meson.build | 1 + Please split your patchset into three, add above files to first patch: [PATCH i-g-t v11 1/3] lib/igt_facts: add fact checking lib > runner/executor.c | 10 + [PATCH i-g-t v11 2/3] runner/executor: Check facts at tests run > tools/lsfacts.c | 25 ++ > tools/meson.build | 1 + [PATCH i-g-t v11 3/3] tools/lsfacts: Add facts listing tool and it all with cover letter. > 8 files changed, 855 insertions(+) > create mode 100644 lib/igt_facts.c > create mode 100644 lib/igt_facts.h > create mode 100644 lib/tests/igt_facts.c > create mode 100644 tools/lsfacts.c > > diff --git a/lib/igt_facts.c b/lib/igt_facts.c > new file mode 100644 > index 000000000..4749d3417 > --- /dev/null > +++ b/lib/igt_facts.c > @@ -0,0 +1,755 @@ > +// SPDX-License-Identifier: MIT > +// Copyright © 2024 Intel Corporation Copyright is usally formatted in C-comment style /* comment */ to allow extending it (or adding Author: fields, etc) See other C files for example tests/device_reset.c So here: /* * Copyright © 2024 Intel Corporation */ Do the same in other new files. Regards, Kamil > + > +#include <ctype.h> > +#include <libudev.h> > +#include <stdio.h> > +#include <sys/time.h> > +#include <time.h> > + > +#include "igt_core.h" > +#include "igt_device_scan.h" > +#include "igt_facts.h" > +#include "igt_kmod.h" > +#include "igt_list.h" > +#include "igt_taints.h" > + > +static struct igt_list_head igt_facts_list_drm_card_head; > +static struct igt_list_head igt_facts_list_kmod_head; > +static struct igt_list_head igt_facts_list_ktaint_head; > +static struct igt_list_head igt_facts_list_pci_gpu_head; > + > + > +/** > + * igt_facts_lists_init: > + * > + * Initialize igt_facts linked lists. > + * > + * Returns: void > + */ > +void igt_facts_lists_init(void) > +{ > + IGT_INIT_LIST_HEAD(&igt_facts_list_drm_card_head); > + IGT_INIT_LIST_HEAD(&igt_facts_list_kmod_head); > + IGT_INIT_LIST_HEAD(&igt_facts_list_ktaint_head); > + IGT_INIT_LIST_HEAD(&igt_facts_list_pci_gpu_head); > +} > + > + > +/** > + * igt_facts_log: > + * @last_test: name of the test that triggered the fact > + * @name: name of the fact > + * @new_value: new value of the fact > + * @old_value: old value of the fact > + * > + * Reports fact changes: > + * - new fact: if old_value is NULL and new_value is not NULL > + * - deleted fact: if new_value is NULL and old_value is not NULL > + * - changed fact: if new_value is different from old_value > + * > + * Returns: void > + */ > +static void igt_facts_log(const char *last_test, const char *name, > + const char *new_value, const char *old_value) > +{ ...cut... > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking 2024-12-06 11:42 ` Kamil Konieczny @ 2024-12-06 13:16 ` Peter Senna Tschudin 2024-12-06 16:46 ` Kamil Konieczny 0 siblings, 1 reply; 13+ messages in thread From: Peter Senna Tschudin @ 2024-12-06 13:16 UTC (permalink / raw) To: Kamil Konieczny, igt-dev@lists.freedesktop.org, Ryszard Knop, Janusz Krzysztofik, Zbigniew Kempczyński, Lucas De Marchi, luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray, dominik.karol.piatkowski, katarzyna.piecielska, Petri Latvala, Juha-Pekka Heikkila, Juha-Pekka Heikkila, Swati Sharma, Jani Saarinen, Jani Nikula, Rob Clark, Rob Clark, Helen Koike, Melissa Wen, Maíra Canal Hi Kamil, Thank you for your comments. I appreciate the opportunity to clarify and address your concerns. On 06.12.2024 12:42, Kamil Konieczny wrote: > Hi Peter, > On 2024-12-05 at 05:54:26 +0100, Peter Senna Tschudin wrote: > > overall looks good, I have few nits, see below. Thank you! Our CI tested it thoroughly, it creates value, and it works well. It is time to merge it. > > As for subject, it usally informs what changed, so "igt-runner fact checking" > is a little misleading. Please see my ask about that below. Please explain what is misleading about the subject. It looks good to me. > >> When using igt-runner, collect facts before each test and after the >> last test, and report when facts change. The facts are: >> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20) >> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1 >> - Kernel taints: kernel.is_tainted.taint_warn: true >> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true >> >> This change imposes little execution overhead and adds just a few > > imho 'little overhead' is beause facts are only four but I agree > we could change implementation when its number grows. Little overhead is because of how I implemented the code. The execution is efficient and produces as little text output as possible. > >> lines of logging. The facts will be printed on normal igt-runner >> output. Here is a real example from our CI showing >> hotreplug-lateclose changing the DRM card number and tainting the >> kernel on the abort path: >> >> [245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose) >> [245.383596] Starting subtest: hotreplug-lateclose >> [249.843361] Aborting: Lockdep not active >> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1 >> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true >> [249.859075] Closing watchdogs >> >> CC: Ryszard Knop <ryszard.knop@intel.com> >> CC: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> >> CC: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> >> CC: Lucas De Marchi <lucas.demarchi@intel.com> >> CC: luciano.coelho@intel.com >> CC: nirmoy.das@intel.com >> CC: stuart.summers@intel.com >> CC: himal.prasad.ghimiray@intel.com >> CC: dominik.karol.piatkowski@intel.com >> CC: katarzyna.piecielska@intel.com > > +cc Petri + kms devs > Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> > Cc: Juha-Pekka Heikkila <juha-pekka.heikkila@intel.com> > Cc: Swati Sharma <swati2.sharma@intel.com> > Cc: Jani Saarinen <jani.saarinen@intel.com> > Cc: Jani Nikula <jani.nikula@linux.intel.com> > > +other gpu devs > Cc: Rob Clark <robdclark@gmail.com> > Cc: Rob Clark <robdclark@chromium.org> > Cc: Helen Koike <helen.koike@collabora.com> > > +drm ci dev > Cc: Melissa Wen <mwen@igalia.com> > Cc: Maíra Canal <mcanal@igalia.com> That is a lot of people to CC, but ok. > >> Signed-off-by: Peter Senna Tschudin <peter.senna@linux.intel.com> >> --- >> Re-sending as the mailing list server rejected the message >> yesterday due to a probably full disk... >> >> v10: >> - fix memory leaks from asprintf (Thank you Dominik Karol!) >> - fix comments for consistency (Thank you Dominik Karol!) >> >> v9: >> - do not report new hardware when loading/unloading kmod changes >> the string of the GPU name. I accidentally reintroduced this >> issue when refactoring to use linked lists. >> - add tools/lsfacts: 9 lines of code that print either the facts >> or that no facts were found. >> - fix code comments describing functions >> - fix white space issues >> >> v8: >> - fix white space issues >> >> v7: >> - refactor to use linked lists provided by igt_lists >> - Added function arguments to code comments >> - updated commit message >> >> v6: >> - sort includes in igt_facts.c alphabetically >> - add facts for kernel taints using igt_kernel_tainted() and >> igt_explain_taints() >> >> v5: >> - fix the broken patch format from v4 >> >> v4: >> - fix a bug on delete_fact() >> - drop glib and calls to g_ functions >> - change commit message to indicate that report only on fact changes >> - use consistent format for reporting changes >> - fix SPDX header format >> >> v3: >> - refreshed commit message >> - changed format SPDX string >> - removed license text >> - replace last_test assignment when null by two ternary operators >> - added function descriptions following example found elsewhere in >> the code >> - added igt_assert to catch failures to realloc() >> >> v2: >> - add lib/tests/igt_facts.c for basic unit testing >> - bugfix: do not report a new gpu when the driver changes the gpu name >> - bugfix: do not report the pci_id twice on the gpu name >> >> lib/igt_facts.c | 755 ++++++++++++++++++++++++++++++++++++++++++ >> lib/igt_facts.h | 47 +++ >> lib/meson.build | 1 + >> lib/tests/igt_facts.c | 15 + >> lib/tests/meson.build | 1 + > > Please split your patchset into three, add above files to > first patch: > > [PATCH i-g-t v11 1/3] lib/igt_facts: add fact checking lib > >> runner/executor.c | 10 + > > [PATCH i-g-t v11 2/3] runner/executor: Check facts at tests run > >> tools/lsfacts.c | 25 ++ >> tools/meson.build | 1 + > > [PATCH i-g-t v11 3/3] tools/lsfacts: Add facts listing tool > > and it all with cover letter. No. Why would I do that? Both the change to runner/executor.c and the new tool are very small, not justifying a separate patch. They also match well in functionality also not justifying a different patch. What gain are you trying to achieve here? I really like a single patch in this case. > >> 8 files changed, 855 insertions(+) >> create mode 100644 lib/igt_facts.c >> create mode 100644 lib/igt_facts.h >> create mode 100644 lib/tests/igt_facts.c >> create mode 100644 tools/lsfacts.c >> >> diff --git a/lib/igt_facts.c b/lib/igt_facts.c >> new file mode 100644 >> index 000000000..4749d3417 >> --- /dev/null >> +++ b/lib/igt_facts.c >> @@ -0,0 +1,755 @@ >> +// SPDX-License-Identifier: MIT >> +// Copyright © 2024 Intel Corporation > > Copyright is usally formatted in C-comment style /* comment */ > to allow extending it (or adding Author: fields, etc) See other > C files for example tests/device_reset.c > So here: > > /* > * Copyright © 2024 Intel Corporation > */ > > Do the same in other new files. No sorry, this is because of you asked me to please kernel's checkpatch.pl. I am not sending another revision for this nit that changes the comment style. Have you seen how the comment style varies in our code base, including recent commits? Why being so picky with my patch? That being said, if you want to implement these changes yourself please go for it. Thank you, Peter > > Regards, > Kamil > >> + >> +#include <ctype.h> >> +#include <libudev.h> >> +#include <stdio.h> >> +#include <sys/time.h> >> +#include <time.h> >> + >> +#include "igt_core.h" >> +#include "igt_device_scan.h" >> +#include "igt_facts.h" >> +#include "igt_kmod.h" >> +#include "igt_list.h" >> +#include "igt_taints.h" >> + >> +static struct igt_list_head igt_facts_list_drm_card_head; >> +static struct igt_list_head igt_facts_list_kmod_head; >> +static struct igt_list_head igt_facts_list_ktaint_head; >> +static struct igt_list_head igt_facts_list_pci_gpu_head; >> + >> + >> +/** >> + * igt_facts_lists_init: >> + * >> + * Initialize igt_facts linked lists. >> + * >> + * Returns: void >> + */ >> +void igt_facts_lists_init(void) >> +{ >> + IGT_INIT_LIST_HEAD(&igt_facts_list_drm_card_head); >> + IGT_INIT_LIST_HEAD(&igt_facts_list_kmod_head); >> + IGT_INIT_LIST_HEAD(&igt_facts_list_ktaint_head); >> + IGT_INIT_LIST_HEAD(&igt_facts_list_pci_gpu_head); >> +} >> + >> + >> +/** >> + * igt_facts_log: >> + * @last_test: name of the test that triggered the fact >> + * @name: name of the fact >> + * @new_value: new value of the fact >> + * @old_value: old value of the fact >> + * >> + * Reports fact changes: >> + * - new fact: if old_value is NULL and new_value is not NULL >> + * - deleted fact: if new_value is NULL and old_value is not NULL >> + * - changed fact: if new_value is different from old_value >> + * >> + * Returns: void >> + */ >> +static void igt_facts_log(const char *last_test, const char *name, >> + const char *new_value, const char *old_value) >> +{ > > ...cut... > >> -- >> 2.34.1 >> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH i-g-t v10] igt-runner fact checking 2024-12-06 13:16 ` Peter Senna Tschudin @ 2024-12-06 16:46 ` Kamil Konieczny 0 siblings, 0 replies; 13+ messages in thread From: Kamil Konieczny @ 2024-12-06 16:46 UTC (permalink / raw) To: Peter Senna Tschudin Cc: igt-dev@lists.freedesktop.org, Ryszard Knop, Janusz Krzysztofik, Zbigniew Kempczyński, Lucas De Marchi, luciano.coelho, nirmoy.das, stuart.summers, himal.prasad.ghimiray, dominik.karol.piatkowski, katarzyna.piecielska, Petri Latvala, Juha-Pekka Heikkila, Juha-Pekka Heikkila, Swati Sharma, Jani Saarinen, Jani Nikula, Rob Clark, Rob Clark, Helen Koike, Melissa Wen, Maíra Canal Hi Peter, On 2024-12-06 at 14:16:27 +0100, Peter Senna Tschudin wrote: > Hi Kamil, > > Thank you for your comments. I appreciate the opportunity to clarify and address > your concerns. > > On 06.12.2024 12:42, Kamil Konieczny wrote: > > Hi Peter, > > On 2024-12-05 at 05:54:26 +0100, Peter Senna Tschudin wrote: > > > > overall looks good, I have few nits, see below. > > Thank you! Our CI tested it thoroughly, it creates value, and it works well. > It is time to merge it. > > > > > As for subject, it usally informs what changed, so "igt-runner fact checking" > > is a little misleading. Please see my ask about that below. > > Please explain what is misleading about the subject. It looks good to me. > Subject suggest that this is a change only for runner, while it also adds a tool lsfacts. > > > >> When using igt-runner, collect facts before each test and after the s/igt-runner/igt_runner/ > >> last test, and report when facts change. The facts are: Add newline here. > >> - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20) > >> - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1 > >> - Kernel taints: kernel.is_tainted.taint_warn: true > >> - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true imho here it should be general description what it will print, like: - GPUs on PCI bus: 'hardware PCI bus' 'GPU name' - Associations between PCI GPU and DRM card: 'PCI bus': 'card number' - Kernel taints: 'true' or 'false' - GPU kernel modules loaded: 'driver name' and then you could provide an example. > >> > >> This change imposes little execution overhead and adds just a few > > > > imho 'little overhead' is beause facts are only four but I agree > > we could change implementation when its number grows. > > Little overhead is because of how I implemented the code. The execution > is efficient and produces as little text output as possible. > > > > >> lines of logging. The facts will be printed on normal igt-runner > >> output. Here is a real example from our CI showing > >> hotreplug-lateclose changing the DRM card number and tainting the > >> kernel on the abort path: > >> > >> [245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose) > >> [245.383596] Starting subtest: hotreplug-lateclose > >> [249.843361] Aborting: Lockdep not active > >> [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1 > >> [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true > >> [249.859075] Closing watchdogs Where is a note about adding new tool? Patch should describe all changes. I would prefer to see such note at beginning of description. Btw splitting this into few functional patches will gain that naturally. > >> > >> CC: Ryszard Knop <ryszard.knop@intel.com> > >> CC: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> > >> CC: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> > >> CC: Lucas De Marchi <lucas.demarchi@intel.com> > >> CC: luciano.coelho@intel.com > >> CC: nirmoy.das@intel.com > >> CC: stuart.summers@intel.com > >> CC: himal.prasad.ghimiray@intel.com > >> CC: dominik.karol.piatkowski@intel.com > >> CC: katarzyna.piecielska@intel.com > > > > +cc Petri + kms devs > > Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> > > Cc: Juha-Pekka Heikkila <juha-pekka.heikkila@intel.com> > > Cc: Swati Sharma <swati2.sharma@intel.com> > > Cc: Jani Saarinen <jani.saarinen@intel.com> > > Cc: Jani Nikula <jani.nikula@linux.intel.com> > > > > +other gpu devs > > Cc: Rob Clark <robdclark@gmail.com> > > Cc: Rob Clark <robdclark@chromium.org> > > Cc: Helen Koike <helen.koike@collabora.com> > > > > +drm ci dev > > Cc: Melissa Wen <mwen@igalia.com> > > Cc: Maíra Canal <mcanal@igalia.com> > > That is a lot of people to CC, but ok. > Well I do not know if it will help other igt_runner users, so it is better to ask. > > > >> Signed-off-by: Peter Senna Tschudin <peter.senna@linux.intel.com> > >> --- > >> Re-sending as the mailing list server rejected the message > >> yesterday due to a probably full disk... > >> > >> v10: > >> - fix memory leaks from asprintf (Thank you Dominik Karol!) > >> - fix comments for consistency (Thank you Dominik Karol!) > >> > >> v9: > >> - do not report new hardware when loading/unloading kmod changes > >> the string of the GPU name. I accidentally reintroduced this > >> issue when refactoring to use linked lists. > >> - add tools/lsfacts: 9 lines of code that print either the facts > >> or that no facts were found. > >> - fix code comments describing functions > >> - fix white space issues > >> > >> v8: > >> - fix white space issues > >> > >> v7: > >> - refactor to use linked lists provided by igt_lists > >> - Added function arguments to code comments > >> - updated commit message > >> > >> v6: > >> - sort includes in igt_facts.c alphabetically > >> - add facts for kernel taints using igt_kernel_tainted() and > >> igt_explain_taints() > >> > >> v5: > >> - fix the broken patch format from v4 > >> > >> v4: > >> - fix a bug on delete_fact() > >> - drop glib and calls to g_ functions > >> - change commit message to indicate that report only on fact changes > >> - use consistent format for reporting changes > >> - fix SPDX header format > >> > >> v3: > >> - refreshed commit message > >> - changed format SPDX string > >> - removed license text > >> - replace last_test assignment when null by two ternary operators > >> - added function descriptions following example found elsewhere in > >> the code > >> - added igt_assert to catch failures to realloc() > >> > >> v2: > >> - add lib/tests/igt_facts.c for basic unit testing > >> - bugfix: do not report a new gpu when the driver changes the gpu name > >> - bugfix: do not report the pci_id twice on the gpu name > >> > >> lib/igt_facts.c | 755 ++++++++++++++++++++++++++++++++++++++++++ > >> lib/igt_facts.h | 47 +++ > >> lib/meson.build | 1 + > >> lib/tests/igt_facts.c | 15 + > >> lib/tests/meson.build | 1 + > > > > Please split your patchset into three, add above files to > > first patch: > > > > [PATCH i-g-t v11 1/3] lib/igt_facts: add fact checking lib > > > >> runner/executor.c | 10 + > > > > [PATCH i-g-t v11 2/3] runner/executor: Check facts at tests run > > > >> tools/lsfacts.c | 25 ++ > >> tools/meson.build | 1 + > > > > [PATCH i-g-t v11 3/3] tools/lsfacts: Add facts listing tool > > > > and it all with cover letter. > > No. Why would I do that? Both the change to runner/executor.c and > the new tool are very small, not justifying a separate patch. They > also match well in functionality also not justifying a different > patch. What gain are you trying to achieve here? I really like a > single patch in this case. > You could split it in other ways, first adding lib and tool lsfacts, then small patch change in runner/executor.c It is also about a risk, change in igt_runner is a main risk and in case we need a revert, I would prefer to revert small patch, one from 2 or 3 rather than one big that added all. Additional gain is more clear git log history. > > > >> 8 files changed, 855 insertions(+) > >> create mode 100644 lib/igt_facts.c > >> create mode 100644 lib/igt_facts.h > >> create mode 100644 lib/tests/igt_facts.c > >> create mode 100644 tools/lsfacts.c > >> > >> diff --git a/lib/igt_facts.c b/lib/igt_facts.c > >> new file mode 100644 > >> index 000000000..4749d3417 > >> --- /dev/null > >> +++ b/lib/igt_facts.c > >> @@ -0,0 +1,755 @@ > >> +// SPDX-License-Identifier: MIT > >> +// Copyright © 2024 Intel Corporation > > > > Copyright is usally formatted in C-comment style /* comment */ > > to allow extending it (or adding Author: fields, etc) See other > > C files for example tests/device_reset.c > > So here: > > > > /* > > * Copyright © 2024 Intel Corporation > > */ > > > > Do the same in other new files. > > No sorry, this is because of you asked me to please kernel's > checkpatch.pl. I am not sending another revision for this nit > that changes the comment style. Have you seen how the comment > style varies in our code base, including recent commits? Why > being so picky with my patch? > I try to look into new patches but I cannot take care of every one. checkpatch.pl checks SPDX line, while 'Copyright' is in separate comment. Check for yourself: grep Copyright lib/*c grep Copyright lib/*h None of it has '//' as a starting comment. And I checked my replay to your v1, I asked about dropping text _after_ Copyright, I did not wrote any about using '//' for it. > That being said, if you want to implement these changes yourself > please go for it. Well, no. I just skimmed over your changes and see there is list of gpu drivers there: +const char *igt_fact_kmod_list[] = { + "amdgpu", + "i915", + "nouveau", + "radeon", + "xe", + "\0" +}; It should be in sync with lib/drmtest.c } modules[] = { { DRIVER_AMDGPU, "amdgpu" }, { DRIVER_INTEL, "i915", modprobe_i915 }, { DRIVER_MSM, "msm" }, { DRIVER_PANFROST, "panfrost" }, { DRIVER_V3D, "v3d" }, { DRIVER_VC4, "vc4" }, { DRIVER_VGEM, "vgem" }, { DRIVER_VMWGFX, "vmwgfx" }, { DRIVER_XE, "xe" }, {} or at least add a comment why it is not. Regards, Kamil > > Thank you, > > Peter > > > > > Regards, > > Kamil > > > >> + > >> +#include <ctype.h> > >> +#include <libudev.h> > >> +#include <stdio.h> > >> +#include <sys/time.h> > >> +#include <time.h> > >> + > >> +#include "igt_core.h" > >> +#include "igt_device_scan.h" > >> +#include "igt_facts.h" > >> +#include "igt_kmod.h" > >> +#include "igt_list.h" > >> +#include "igt_taints.h" > >> + > >> +static struct igt_list_head igt_facts_list_drm_card_head; > >> +static struct igt_list_head igt_facts_list_kmod_head; > >> +static struct igt_list_head igt_facts_list_ktaint_head; > >> +static struct igt_list_head igt_facts_list_pci_gpu_head; > >> + > >> + > >> +/** > >> + * igt_facts_lists_init: > >> + * > >> + * Initialize igt_facts linked lists. > >> + * > >> + * Returns: void > >> + */ > >> +void igt_facts_lists_init(void) > >> +{ > >> + IGT_INIT_LIST_HEAD(&igt_facts_list_drm_card_head); > >> + IGT_INIT_LIST_HEAD(&igt_facts_list_kmod_head); > >> + IGT_INIT_LIST_HEAD(&igt_facts_list_ktaint_head); > >> + IGT_INIT_LIST_HEAD(&igt_facts_list_pci_gpu_head); > >> +} > >> + > >> + > >> +/** > >> + * igt_facts_log: > >> + * @last_test: name of the test that triggered the fact > >> + * @name: name of the fact > >> + * @new_value: new value of the fact > >> + * @old_value: old value of the fact > >> + * > >> + * Reports fact changes: > >> + * - new fact: if old_value is NULL and new_value is not NULL > >> + * - deleted fact: if new_value is NULL and old_value is not NULL > >> + * - changed fact: if new_value is different from old_value > >> + * > >> + * Returns: void > >> + */ > >> +static void igt_facts_log(const char *last_test, const char *name, > >> + const char *new_value, const char *old_value) > >> +{ > > > > ...cut... > > > >> -- > >> 2.34.1 > >> > ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-12-10 13:42 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <5a111c35-7245-4ada-a2a0-3fd0fd5bbeab@linux.intel.com>
2024-12-05 14:05 ` [PATCH i-g-t v10] igt-runner fact checking Janusz Krzysztofik
2024-12-06 5:45 ` Peter Senna Tschudin
2024-12-09 9:17 ` Janusz Krzysztofik
2024-12-09 11:06 ` Peter Senna Tschudin
2024-12-09 13:50 ` Janusz Krzysztofik
2024-12-10 8:38 ` Peter Senna Tschudin
2024-12-10 9:50 ` Janusz Krzysztofik
2024-12-10 13:41 ` Knop, Ryszard
2024-11-02 11:37 [PATCH i-g-t] " Peter Senna Tschudin
2024-12-05 4:54 ` [PATCH i-g-t v10] " Peter Senna Tschudin
2024-12-05 9:08 ` Piatkowski, Dominik Karol
2024-12-06 11:42 ` Kamil Konieczny
2024-12-06 13:16 ` Peter Senna Tschudin
2024-12-06 16:46 ` Kamil Konieczny
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox