Igt-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Senna Tschudin <peter.senna@linux.intel.com>
To: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>,
	"igt-dev@lists.freedesktop.org" <igt-dev@lists.freedesktop.org>
Cc: "Ryszard Knop" <ryszard.knop@intel.com>,
	"Zbigniew Kempczyński" <zbigniew.kempczynski@intel.com>,
	"Lucas De Marchi" <lucas.demarchi@intel.com>,
	luciano.coelho@intel.com, nirmoy.das@intel.com,
	stuart.summers@intel.com, himal.prasad.ghimiray@intel.com,
	dominik.karol.piatkowski@intel.com,
	katarzyna.piecielska@intel.com
Subject: Re: [PATCH i-g-t v10] igt-runner fact checking
Date: Mon, 9 Dec 2024 12:06:33 +0100	[thread overview]
Message-ID: <c3b14136-db8a-4d3d-82cb-038c2241fe76@linux.intel.com> (raw)
In-Reply-To: <15344780.JCcGWNJJiE@jkrzyszt-mobl2.ger.corp.intel.com>

Hi Janusz,

On 09.12.2024 10:17, Janusz Krzysztofik wrote:
> On Friday, 6 December 2024 06:45:31 CET Peter Senna Tschudin wrote:
>> Hi Janusz,
>>
>> Thank you for your detailed comments. I appreciate the opportunity
>> to clarify and address your concerns.
>>
>> On 05.12.2024 15:05, Janusz Krzysztofik wrote:
>>> Hi Peter,
>>>
>>> On Wednesday, 4 December 2024 19:44:53 CET Peter Senna Tschudin wrote:
>>>> When using igt-runner, collect facts before each test and after the
>>>> last test, and report when facts change. The facts are:
>>>>  - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20)
>>>>  - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1
>>>>  - Kernel taints: kernel.is_tainted.taint_warn: true
>>>>  - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true
>>>>
>>>> This change imposes little execution overhead and adds just a few
>>>> lines of logging. The facts will be printed on normal igt-runner
>>>> output. Here is a real example from our CI shwoing
>>>> hotreplug-lateclose changing the DRM card number 
>>>
>>> Since you give that as an example of how helpful your facts can be, and follow 
>>> that with a kernel taint example, that may indicate you think, and users of 
>>> your facts may then be mislead having that read, that the taint was related to 
>>> the change of card number, while both had nothing to do with each other.
>>
>> Let’s take a step back to define the purpose and scope of igt-facts:
>>  - Definition of a fact from the dictionary: A fact is an objectively verifiable
>>    piece of information.
>>  - Purpose of igt-facts: Track which tests cause changes to the facts.
>>
>> The operation is straightforward: facts are collected before and after each test,
>> and any differences are logged. Here’s an example showing a fact change and a new
>> fact after running hotreplug-lateclose:
>>
>>  [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1
>>  [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true
>>
>> This output highlights the facts without implying causation between them. The
>> tool(and my commit message) neither explains relationships between facts nor
>> misleads users into assuming causation.
> 
> For me your commit message does.

Can you please provide some external support for your claims?

I am very careful while writing, and I am aware of my own limitations.
After reading what chatGPT wrote about my text, I felt comfortable
about not misleading anyone. Please mind the wording of my
question: I did not hint a desired outcome for chatGPT.

My Question:
-- // --
Can you please indicate any potential causality relationship
that the following text may offer?

Here is a real example from our CI shwoing hotreplug-lateclose
changing the DRM card number and tainting the kernel on the
abort path
-- // --

chatGPT answer:
--//--
The text suggests a potential causality relationship between the
"hotreplug-lateclose" operation and its effects on the system:

1. Causal Event: The "hotreplug-lateclose" operation.
This likely refers to a process in the CI(Continuous Integration)
environment related to hotplugging devices and managing late-close
scenarios. Hotplugging involves dynamically adding or removing
hardware components without shutting down the system.

2. Effects:

* Changing the DRM card number: This implies that the operation
affects the enumeration or reordering of DRM (Direct Rendering
Manager) devices, causing the card number to be reassigned.

* Kernel Tainting on the abort path: Kernel tainting typically
indicates that the kernel has entered a state that is not
guaranteed to be stable or reliable. This could occur due
to improper handling of errors or unexpected conditions
during the "abort path" (likely a failure or cleanup process
in the operation).

Suggested Causal Chain:
1. The "hotreplug-lateclose" operation executes in the CI.
2. This causes a change in DRM card enumeration (potentially due
to how devices are initialized or re-initialized).
3. On the abort path of this operation, a condition arises that
taints the kernel (possibly due to resource cleanup issues,
unhandled errors, or driver misbehavior).

This text highlights a possible link between hotreplug-lateclose
handling and instabilities in DRM device management as well as
kernel state integrity.
--//--



> 
> Can you please provide a full list of "facts" your code is supposed to handle?

This is in the commit message already, at the very begining.
  
> Can you please explain why you selected just those "facts", not others?

It was either what was missing, such as a convenient way to learn when
something strange happend as a gpu disappearing from the PCI bus, or
something that we believe may cause errors downstream such as a taint,
and the list of loaded modules.

For the drm card number association, we belive that there may be a caching
issue: we are trying to figure it out if the drm-reopen cache handles the
change of drm number association well. We have weak information pointing
to a probable problem akin to missing cache invalidation.


Thanks!

[...]

  reply	other threads:[~2024-12-09 11:06 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <5a111c35-7245-4ada-a2a0-3fd0fd5bbeab@linux.intel.com>
2024-12-05 14:05 ` [PATCH i-g-t v10] igt-runner fact checking Janusz Krzysztofik
2024-12-06  5:45   ` Peter Senna Tschudin
2024-12-09  9:17     ` Janusz Krzysztofik
2024-12-09 11:06       ` Peter Senna Tschudin [this message]
2024-12-09 13:50         ` Janusz Krzysztofik
2024-12-10  8:38           ` Peter Senna Tschudin
2024-12-10  9:50             ` Janusz Krzysztofik
2024-12-10 13:41               ` Knop, Ryszard
2024-11-02 11:37 [PATCH i-g-t] " Peter Senna Tschudin
2024-12-05  4:54 ` [PATCH i-g-t v10] " Peter Senna Tschudin
2024-12-05  9:08   ` Piatkowski, Dominik Karol
2024-12-06 11:42   ` Kamil Konieczny
2024-12-06 13:16     ` Peter Senna Tschudin
2024-12-06 16:46       ` Kamil Konieczny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c3b14136-db8a-4d3d-82cb-038c2241fe76@linux.intel.com \
    --to=peter.senna@linux.intel.com \
    --cc=dominik.karol.piatkowski@intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=igt-dev@lists.freedesktop.org \
    --cc=janusz.krzysztofik@linux.intel.com \
    --cc=katarzyna.piecielska@intel.com \
    --cc=lucas.demarchi@intel.com \
    --cc=luciano.coelho@intel.com \
    --cc=nirmoy.das@intel.com \
    --cc=ryszard.knop@intel.com \
    --cc=stuart.summers@intel.com \
    --cc=zbigniew.kempczynski@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox