Igt-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Dong, Zhanjun" <zhanjun.dong@intel.com>
To: "Cavitt, Jonathan" <jonathan.cavitt@intel.com>,
	"igt-dev@lists.freedesktop.org" <igt-dev@lists.freedesktop.org>
Subject: Re: [PATCH i-g-t v10] tests/intel/xe_exec_capture: Add xe_exec_capture test
Date: Wed, 8 Jan 2025 18:50:36 -0500	[thread overview]
Message-ID: <a24b3e36-6d83-43ae-b952-ab56f35fe820@intel.com> (raw)
In-Reply-To: <CH0PR11MB54446DFF8DE09B3A0CC217B5E5122@CH0PR11MB5444.namprd11.prod.outlook.com>



On 2025-01-08 1:23 p.m., Cavitt, Jonathan wrote:
> -----Original Message-----
> From: igt-dev <igt-dev-bounces@lists.freedesktop.org> On Behalf Of Zhanjun Dong
> Sent: Tuesday, January 7, 2025 4:04 PM
> To: igt-dev@lists.freedesktop.org
> Cc: Dong, Zhanjun <zhanjun.dong@intel.com>
> Subject: [PATCH i-g-t v10] tests/intel/xe_exec_capture: Add xe_exec_capture test
>>
>> Submit cmds to GPU to cause engine reset, check generated devcoredump
>> register dump, check against expected values or within the range.
>>
>> Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
>>
>> ---
>>
>> Changes from prior revs:
>>   v10:- Move job timeout save/restore out of subtest, to avoid being bypassed
>>         by failed assertion
>>         Save/restore job timeout for each engine class
>>         Remove testing on multiple GPUs, to be put back after further discussion.
>>   v9:-  Reduced job timeout to 2 seconds to speedup test
>>         Add info print to show test is running on single/multiple GPU
>>   v8:-  Move change list below ---
>>   v7:-  Fix typo and removed unused macros
>>   v6:-  Adjust start_line to start from 0
>>         Use 7 bit engine_cid, start with random number
>>         Add ioerror detect on fgets
>>         Reorgnize the regular expression
>>         Remove unnecessary radom seed init
>>   v5:-  Detect devcoredump matches the testing engine
>>         Engine will run with random cid
>>   v4:-  Support runs on multiple GPU
>>         Load all devcoredump content to buffer
>>         Alloc line buffer dynamic vs static global memory
>>         Changed to igt_assert_f to provide more info if failed
>>   v3:-  Remove call to bash and awk
>>         Add regular express parse
>>         Detect devcoredump through card index
>>         Add devcoredump removal check
>>   v2:-  Fix CI.build error
>>         Add multiple GPU card support
>> ---
>>   tests/intel/xe_exec_capture.c | 519 ++++++++++++++++++++++++++++++++++
>>   tests/meson.build             |   1 +
>>   2 files changed, 520 insertions(+)
>>   create mode 100644 tests/intel/xe_exec_capture.c
>>
>> diff --git a/tests/intel/xe_exec_capture.c b/tests/intel/xe_exec_capture.c
>> new file mode 100644
>> index 000000000..b0642b406
>> --- /dev/null
>> +++ b/tests/intel/xe_exec_capture.c
>> @@ -0,0 +1,519 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2024 Intel Corporation
>> + */
>> +
>> +/**
>> + * TEST: Basic tests for GuC based register capture
>> + * Category: Core
>> + * Mega feature: General Core features
>> + * Sub-category: CMD submission
>> + * Functionality: Debug
>> + * Test category: functionality test
>> + */
>> +
>> +#include <ctype.h>
>> +#include <fcntl.h>
...

>> +
>> +igt_main
>> +{
>> +	int xe;
>> +	struct drm_xe_engine_class_instance *hwe;
>> +	u64 timeouts[DRM_XE_ENGINE_CLASS_VM_BIND] = {0};
>> +
>> +	igt_fixture {
>> +		xe = drm_open_driver(DRIVER_XE);
>> +		xe_for_each_engine(xe, hwe) {
>> +			/* Skip kernel only classes */
>> +			if (hwe->engine_class >= DRM_XE_ENGINE_CLASS_VM_BIND)
>> +				continue;
>> +			/* Skip classes already set */
>> +			if (timeouts[hwe->engine_class])
>> +				continue;
>> +			/* Save original timeout value */
>> +			timeouts[hwe->engine_class] = xe_sysfs_get_job_timeout_ms(xe, hwe);
>> +			/* Reduce timeout value to speedup test */
>> +			xe_sysfs_set_job_timeout_ms(xe, hwe, CAPTURE_JOB_TIMEOUT);
>> +
>> +			igt_debug("Reduced %s class timeout from %ld to %d\n",
>> +				  xe_engine_class_name(hwe->engine_class),
>> +				  timeouts[hwe->engine_class], CAPTURE_JOB_TIMEOUT);
>> +		}
>> +	}
>> +
>> +	igt_subtest("reset")
>> +		test_card(xe);
>> +
>> +	igt_fixture {
>> +		xe_for_each_engine(xe, hwe) {
>> +			/* Skip kernel only classes */
>> +			if (hwe->engine_class >= DRM_XE_ENGINE_CLASS_VM_BIND)
>> +				continue;
>> +			/* Skip classes already set */
>> +			if (timeouts[hwe->engine_class] == 0)
>> +				continue;
>> +			/* Restore original timeout value */
>> +			xe_sysfs_set_job_timeout_ms(xe, hwe, timeouts[hwe->engine_class]);
> 
> We should probably assert that we're correctly setting the timeout to the expected
> value after writing, and abort further IGT testing if the sysfs setting is improperly
> reset:
> 
> """
> 		xe_for_each_engine(xe, hwe) {
> 			u64 store, timeout;
> 
> 			/* Skip kernel only classes */
> 			if (hwe->engine_class >= DRM_XE_ENGINE_CLASS_VM_BIND)
> 				continue;
> 
> 			timeout = timeouts[hwe->engine_class]);
> 			/* Skip classes already set */
> 			if (!timeout)
> 				continue;
> 
> 			/* Restore original timeout value */
> 			xe_sysfs_set_job_timeout_ms(xe, hwe, timeout);
> 
> 			/* Assert successful restore */
> 			store = xe_sysfs_get_job_timeout_ms(xe, hwe);
> 			igt_abort_on_f(timeout != store,
> 				       " job_timeout_ms not restored!\n");
> 
> 			timeouts[hwe->engine_class] = 0;
> 		}
> """
> 
> I recognize that igt_sysfs_set_u64 is internally performing an assert on the write,
> but we need to perform an abort here for the following reasons:
> 1) A failure to restore the job_timeout_ms value can cause spurious test failures
> later in IGT execution, which we need to abort to prevent.
> 2) The assert on the write only determines that the write occurred, not necessarily
> that it was entirely successful (I.E. if we attempt to write 100 ms into job_timeout_ms,
> we could get a partial write of only 10 ms or 1 ms and not catch it).
Good point, it is important to abort the IGT test if restore failed, 
save us from debug on false alarm.

> 
> I don't see any reason to block on this patch beyond this, however, so just add
> the necessary assert, and this is
> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> -Jonathan Cavitt

Thanks for review.

I will post next rev as recommended.

Regards,
Zhanjun Dong

> 
>> +			igt_debug("Restored %s class timeout to %ld\n",
>> +				  xe_engine_class_name(hwe->engine_class),
>> +				  timeouts[hwe->engine_class]);
>> +
>> +			timeouts[hwe->engine_class] = 0;
>> +		}
>> +
>> +		drm_close_driver(xe);
>> +	}
>> +}
>> diff --git a/tests/meson.build b/tests/meson.build
>> index 89bba6454..895d911f8 100644
>> --- a/tests/meson.build
>> +++ b/tests/meson.build
>> @@ -286,6 +286,7 @@ intel_xe_progs = [
>>   	'xe_exec_atomic',
>>   	'xe_exec_balancer',
>>   	'xe_exec_basic',
>> +	'xe_exec_capture',
>>   	'xe_exec_compute_mode',
>>   	'xe_exec_fault_mode',
>>   	'xe_exec_mix_modes',
>> -- 
>> 2.34.1
>>
>>


  reply	other threads:[~2025-01-08 23:51 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-08  0:04 [PATCH i-g-t v10] tests/intel/xe_exec_capture: Add xe_exec_capture test Zhanjun Dong
2025-01-08  2:08 ` ✗ Xe.CI.BAT: failure for tests/intel/xe_exec_capture: Add xe_exec_capture test (rev9) Patchwork
2025-01-08  2:10 ` ✓ i915.CI.BAT: success " Patchwork
2025-01-08 11:10 ` ✗ i915.CI.Full: failure " Patchwork
2025-01-08 18:23 ` [PATCH i-g-t v10] tests/intel/xe_exec_capture: Add xe_exec_capture test Cavitt, Jonathan
2025-01-08 23:50   ` Dong, Zhanjun [this message]
2025-01-09 20:51 ` ✗ Xe.CI.Full: failure for tests/intel/xe_exec_capture: Add xe_exec_capture test (rev9) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a24b3e36-6d83-43ae-b952-ab56f35fe820@intel.com \
    --to=zhanjun.dong@intel.com \
    --cc=igt-dev@lists.freedesktop.org \
    --cc=jonathan.cavitt@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox