From: John Harrison <john.c.harrison@intel.com>
To: Jani Nikula <jani.nikula@intel.com>,
Satyanarayana K V P <satyanarayana.k.v.p@intel.com>,
<intel-xe@lists.freedesktop.org>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Subject: Re: [PATCH v6] drm/xe: Add helper function to inject fault into ct_dead_capture()
Date: Fri, 6 Jun 2025 17:06:57 -0700 [thread overview]
Message-ID: <1e939cd3-1db4-432e-89c7-add155d67606@intel.com> (raw)
In-Reply-To: <ff47692cc6c70d10b3e58cf68e6d70ef79ebd997@intel.com>
On 6/6/2025 6:19 AM, Jani Nikula wrote:
> On Thu, 05 Jun 2025, John Harrison <john.c.harrison@intel.com> wrote:
>> On 5/24/2025 7:46 AM, Satyanarayana K V P wrote:
>>> When injecting fault to xe_guc_ct_send_recv() & xe_guc_mmio_send_recv()
>>> functions, the CI test systems are going out of space and crashing. To
>>> avoid this issue, a new helper function is created and when fault is
>>> injected into this xe_inject_fault() helper function, ct dead capture
>>> is avoided which suppresses ct dumps in the log.
>>>
>>> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
>>> Suggested-by: John Harrison <John.C.Harrison@Intel.com>
>> Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
>>
>> This seems like the simplest and cleanest solution to me (for both the
>> KMD and the IGT sides). I don't know if Jani or Michal still have
>> objections to it.
> Simple it may be, but I still think it conflates two orthogonal things
> that should both be decided by userspace.
It is being decided by userspace. And they are not orthogonal but
completely dependent.
> If userspace wants error injection, why should the kernel decide it
> means no error capture in dmesg? Especially when that decision is to
> tackle an arbitrary self-inflicted *userspace* issue i.e. disk space
> limitation during testing.
Ignoring the CI limitation, it is also unhelpful to have huge logs
describing hardware state for a bug that is purely software injected.
The dump to dmesg only occurs when something has gone wrong that is not
supposed to be possible. There should certainly not be any way that a
user land test can cause such an event. It is there to debug the rare
and hard to repro cases that cannot be debugged any other way.
Whereas, this specific test is about making sure the KMD does not die
horribly when an unexpected fault occurs. It is not an attempt to debug
such occurrences, merely a check that the universe survives them. Which
means that any debug output related to the source of the error is
fundamentally useless - the test itself is the source. So any debug
output related to debugging the source is simply noise and not relevant
to debugging why the driver died *after* such an error occurred.
As the fault injection test is the only way to inject such errors, by
definition, it is sensible that the fault injection test is the only
thing that needs to suppress such output.
In other words, the test and the output suppression are inherently
connected and not orthogonal.
So having extra complication to make the suppression mechanism generic
and unrelated to the injection test is wasted effort and unnecessary
complication.
If you want a more general purpose way to control debug output for
specific local testing, we already have that via config options and
kernel/drm debug levels. This is specifically and solely about improving
the usability of the error injection test (for which config options are
not viable because those are fixed across CI runs, likewise debug levels
are not useful because we do need to see the normal driver debug output
to debug any test failures).
John.
>
> Mechanism, not policy.
>
> There, I've said it, but this is not a hill I'm going to die on.
>
>
> BR,
> Jani.
>
>
next prev parent reply other threads:[~2025-06-07 0:07 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-24 14:46 [PATCH v6] drm/xe: Add helper function to inject fault into ct_dead_capture() Satyanarayana K V P
2025-05-24 14:35 ` ✓ CI.Patch_applied: success for drm/xe: Add helper function to inject fault into ct_dead_capture() (rev6) Patchwork
2025-05-24 14:35 ` ✓ CI.checkpatch: " Patchwork
2025-05-24 14:37 ` ✓ CI.KUnit: " Patchwork
2025-05-24 14:47 ` ✓ CI.Build: " Patchwork
2025-05-24 14:49 ` ✓ CI.Hooks: " Patchwork
2025-05-24 14:51 ` ✓ CI.checksparse: " Patchwork
2025-05-24 15:12 ` ✓ Xe.CI.BAT: " Patchwork
2025-05-24 23:52 ` ✗ Xe.CI.Full: failure " Patchwork
2025-06-05 20:04 ` [PATCH v6] drm/xe: Add helper function to inject fault into ct_dead_capture() John Harrison
2025-06-06 13:19 ` Jani Nikula
2025-06-07 0:06 ` John Harrison [this message]
2025-06-11 7:35 ` Jani Nikula
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1e939cd3-1db4-432e-89c7-add155d67606@intel.com \
--to=john.c.harrison@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jani.nikula@intel.com \
--cc=michal.wajdeczko@intel.com \
--cc=satyanarayana.k.v.p@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox