From: John Harrison <john.c.harrison@intel.com>
To: Michal Wajdeczko <michal.wajdeczko@intel.com>,
Satyanarayana K V P <satyanarayana.k.v.p@intel.com>,
<intel-xe@lists.freedesktop.org>
Cc: Aditya Chauhan <aditya.chauhan@intel.com>,
Jani Nikula <jani.nikula@intel.com>,
Jonathan Cavitt <jonathan.cavitt@intel.com>
Subject: Re: [PATCH v3] drm/xe: Add helper function to inject fault into ct_dead_capture()
Date: Thu, 8 May 2025 17:37:51 -0700 [thread overview]
Message-ID: <9f201078-866b-440d-8253-fe7cd1d90a87@intel.com> (raw)
In-Reply-To: <3834b63f-2c0d-4a91-81ea-eb8c57f12b31@intel.com>
On 5/7/2025 8:32 AM, Michal Wajdeczko wrote:
> On 07.05.2025 15:15, Satyanarayana K V P wrote:
>> When injecting fault to xe_guc_ct_send_recv() & xe_guc_mmio_send_recv()
>> functions, the CI test systems are going out of space and crashing. To
>> avoid this issue, a new helper function is created and when fault is
>> injected into this xe_should_fail_ct_dead_capture() helper function,
>> ct dead capture is avoided which suppresses ct dumps in the log.
>>
>> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
>> Suggested-by: John Harrison <John.C.Harrison@Intel.com>
>> Tested-by: Aditya Chauhan <aditya.chauhan@intel.com>
>>
>> ---
>> Cc: Jani Nikula <jani.nikula@intel.com>
>>
>> V2 -> V3:
>> - Added inline function to avoid compilation error in the absence of
>> CONFIG_FUNCTION_ERROR_INJECTION.
>>
>> V1 -> V2:
>> - Fixed review comments.
>> ---
>> drivers/gpu/drm/xe/xe_guc_ct.c | 25 +++++++++++++++++++++++++
>> 1 file changed, 25 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
>> index 2447de0ebedf..d959cc2e7b40 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
>> @@ -1770,6 +1770,24 @@ void xe_guc_ct_print(struct xe_guc_ct *ct, struct drm_printer *p, bool want_ctb)
>> }
>>
>> #if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
>> +/**
>> + * xe_should_fail_ct_dead_capture - Helper function to inject fault.
> this is a static function, likely we don't want full kernel-doc for it
Meaning just make it a regular comment? We definitely want documentation
of some sort.
>
>> + *
>> + * This is a helper function to inject fault into ct_dead_capture().
>> + * As fault is injected using this function, need to make sure that
>> + * the compiler does not optimize and make it as a inline function.
>> + * To prevent compile optimization, "noinline" is added.
>> + */
>> +#ifdef CONFIG_FUNCTION_ERROR_INJECTION
>> +static noinline int xe_should_fail_ct_dead_capture(void)
>> +{
>> + return 0;
>> +}
>> +ALLOW_ERROR_INJECTION(xe_should_fail_ct_dead_capture, ERRNO);
> hmm, but do we really need to abuse the error-injection framework to
> suppress GuC log dump? maybe cleaner option would be to just add module
> parameter (maybe under CONFIG_XE_DEBUG) that we can setup in the test
> and check here?
If we cannot under any circumstances have a module option for doing
something useful like adjusting the GuC log size then I don't see how
once can be accepted for something that is purely a hack for reducing CI
spam.
As per my earlier comments, having a generic 'error injection test in
progress' function would seem a cleaner way to do this. The CT code (or
any other similar code) can check that function and early exit as
appropriate, but the function itself is not specific to any particular
subsystem.
>
>> +#else
>> +static inline int xe_should_fail_ct_dead_capture(void) { return 0; }
>> +#endif
>> +
>> static void ct_dead_capture(struct xe_guc_ct *ct, struct guc_ctb *ctb, u32 reason_code)
>> {
>> struct xe_guc_log_snapshot *snapshot_log;
>> @@ -1778,6 +1796,13 @@ static void ct_dead_capture(struct xe_guc_ct *ct, struct guc_ctb *ctb, u32 reaso
>> unsigned long flags;
>> bool have_capture;
>>
>> + /*
>> + * Huge dump is getting generated when injecting error for guc CT/MMIO
>> + * functions. So, let us suppress the dump when fault is injected.
>> + */
>> + if (xe_should_fail_ct_dead_capture())
>> + return;
> shouldn't we exit *after* the below statement?
>
> we want to skip just a dump, not to skip 'broken' markup
This is true. The 'broken' flag still needs to be set - that even
happens when not in an XE_DEBUG build, and is a required part of the
error handling to prevent further cascading of errors.
John.
>
>> +
>> if (ctb)
>> ctb->info.broken = true;
>>
next prev parent reply other threads:[~2025-05-09 0:38 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-07 13:15 [PATCH v3] drm/xe: Add helper function to inject fault into ct_dead_capture() Satyanarayana K V P
2025-05-07 13:09 ` ✓ CI.Patch_applied: success for drm/xe: Add helper function to inject fault into ct_dead_capture() (rev3) Patchwork
2025-05-07 13:10 ` ✓ CI.checkpatch: " Patchwork
2025-05-07 13:12 ` ✓ CI.KUnit: " Patchwork
2025-05-07 13:20 ` ✓ CI.Build: " Patchwork
2025-05-07 13:22 ` ✓ CI.Hooks: " Patchwork
2025-05-07 13:24 ` ✓ CI.checksparse: " Patchwork
2025-05-07 13:48 ` ✓ Xe.CI.BAT: " Patchwork
2025-05-07 15:32 ` [PATCH v3] drm/xe: Add helper function to inject fault into ct_dead_capture() Michal Wajdeczko
2025-05-09 0:37 ` John Harrison [this message]
2025-05-08 9:38 ` ✗ Xe.CI.Full: failure for drm/xe: Add helper function to inject fault into ct_dead_capture() (rev3) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9f201078-866b-440d-8253-fe7cd1d90a87@intel.com \
--to=john.c.harrison@intel.com \
--cc=aditya.chauhan@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jani.nikula@intel.com \
--cc=jonathan.cavitt@intel.com \
--cc=michal.wajdeczko@intel.com \
--cc=satyanarayana.k.v.p@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox