Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: John Harrison <john.c.harrison@intel.com>
To: Jani Nikula <jani.nikula@intel.com>,
	Satyanarayana K V P <satyanarayana.k.v.p@intel.com>,
	<intel-xe@lists.freedesktop.org>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Subject: Re: [PATCH v6] drm/xe: Add helper function to inject fault into ct_dead_capture()
Date: Fri, 6 Jun 2025 17:06:57 -0700	[thread overview]
Message-ID: <1e939cd3-1db4-432e-89c7-add155d67606@intel.com> (raw)
In-Reply-To: <ff47692cc6c70d10b3e58cf68e6d70ef79ebd997@intel.com>

On 6/6/2025 6:19 AM, Jani Nikula wrote:
> On Thu, 05 Jun 2025, John Harrison <john.c.harrison@intel.com> wrote:
>> On 5/24/2025 7:46 AM, Satyanarayana K V P wrote:
>>> When injecting fault to xe_guc_ct_send_recv() & xe_guc_mmio_send_recv()
>>> functions, the CI test systems are going out of space and crashing. To
>>> avoid this issue, a new helper function is created and when fault is
>>> injected into this xe_inject_fault() helper function, ct dead capture
>>> is avoided which suppresses ct dumps in the log.
>>>
>>> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
>>> Suggested-by: John Harrison <John.C.Harrison@Intel.com>
>> Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
>>
>> This seems like the simplest and cleanest solution to me (for both the
>> KMD and the IGT sides). I don't know if Jani or Michal still have
>> objections to it.
> Simple it may be, but I still think it conflates two orthogonal things
> that should both be decided by userspace.
It is being decided by userspace. And they are not orthogonal but 
completely dependent.

> If userspace wants error injection, why should the kernel decide it
> means no error capture in dmesg? Especially when that decision is to
> tackle an arbitrary self-inflicted *userspace* issue i.e. disk space
> limitation during testing.
Ignoring the CI limitation, it is also unhelpful to have huge logs 
describing hardware state for a bug that is purely software injected.

The dump to dmesg only occurs when something has gone wrong that is not 
supposed to be possible. There should certainly not be any way that a 
user land test can cause such an event. It is there to debug the rare 
and hard to repro cases that cannot be debugged any other way.

Whereas, this specific test is about making sure the KMD does not die 
horribly when an unexpected fault occurs. It is not an attempt to debug 
such occurrences, merely a check that the universe survives them. Which 
means that any debug output related to the source of the error is 
fundamentally useless - the test itself is the source. So any debug 
output related to debugging the source is simply noise and not relevant 
to debugging why the driver died *after* such an error occurred.

As the fault injection test is the only way to inject such errors, by 
definition, it is sensible that the fault injection test is the only 
thing that needs to suppress such output.

In other words, the test and the output suppression are inherently 
connected and not orthogonal.

So having extra complication to make the suppression mechanism generic 
and unrelated to the injection test is wasted effort and unnecessary 
complication.

If you want a more general purpose way to control debug output for 
specific local testing, we already have that via config options and 
kernel/drm debug levels. This is specifically and solely about improving 
the usability of the error injection test (for which config options are 
not viable because those are fixed across CI runs, likewise debug levels 
are not useful because we do need to see the normal driver debug output 
to debug any test failures).

John.

>
> Mechanism, not policy.
>
> There, I've said it, but this is not a hill I'm going to die on.
>
>
> BR,
> Jani.
>
>


  reply	other threads:[~2025-06-07  0:07 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-24 14:46 [PATCH v6] drm/xe: Add helper function to inject fault into ct_dead_capture() Satyanarayana K V P
2025-05-24 14:35 ` ✓ CI.Patch_applied: success for drm/xe: Add helper function to inject fault into ct_dead_capture() (rev6) Patchwork
2025-05-24 14:35 ` ✓ CI.checkpatch: " Patchwork
2025-05-24 14:37 ` ✓ CI.KUnit: " Patchwork
2025-05-24 14:47 ` ✓ CI.Build: " Patchwork
2025-05-24 14:49 ` ✓ CI.Hooks: " Patchwork
2025-05-24 14:51 ` ✓ CI.checksparse: " Patchwork
2025-05-24 15:12 ` ✓ Xe.CI.BAT: " Patchwork
2025-05-24 23:52 ` ✗ Xe.CI.Full: failure " Patchwork
2025-06-05 20:04 ` [PATCH v6] drm/xe: Add helper function to inject fault into ct_dead_capture() John Harrison
2025-06-06 13:19   ` Jani Nikula
2025-06-07  0:06     ` John Harrison [this message]
2025-06-11  7:35       ` Jani Nikula

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1e939cd3-1db4-432e-89c7-add155d67606@intel.com \
    --to=john.c.harrison@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jani.nikula@intel.com \
    --cc=michal.wajdeczko@intel.com \
    --cc=satyanarayana.k.v.p@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox