public inbox for igt-dev@lists.freedesktop.org
 help / color / mirror / Atom feed
From: "Rudnicki, Piotr" <piotr.rudnicki@intel.com>
To: "Piatkowski, Dominik Karol" <dominik.karol.piatkowski@intel.com>,
	"igt-dev@lists.freedesktop.org" <igt-dev@lists.freedesktop.org>
Cc: <piotr.rudnicki@intel.com>
Subject: Re: [PATCH i-g-t 1/1] tests/intel/xe_eudebug_online: test_interrupt_other wait fix
Date: Wed, 28 Jan 2026 11:09:24 +0100	[thread overview]
Message-ID: <b13185e5-340d-4947-b420-a24658abb39a@intel.com> (raw)
In-Reply-To: <DS0PR11MB726440BC41A7576D0F9958B7D990A@DS0PR11MB7264.namprd11.prod.outlook.com>

Hi Dominik,

On 1/27/2026 2:42 PM, Piatkowski, Dominik Karol wrote:
> Hi Piotr,
> 
>> -----Original Message-----
>> From: igt-dev <igt-dev-bounces@lists.freedesktop.org> On Behalf Of Piotr
>> Rudnicki
>> Sent: Friday, January 23, 2026 12:35 PM
>> To: igt-dev@lists.freedesktop.org
>> Cc: Rudnicki, Piotr <piotr.rudnicki@intel.com>
>> Subject: [PATCH i-g-t 1/1] tests/intel/xe_eudebug_online:
>> test_interrupt_other wait fix
>>
>> The test was incorrectly waiting for debugee completion when the first
>> client should complete due to the reset. Fix the test to wait for the
>> correct client after reset and properly cleanup the debugee.
>>
>> Add pipe validation in pipe_signal() to prevent writes to closed pipes
>> after SIGKILL. Update pipe_close() to set descriptors to -1 after closing
>> to ensure the guard condition works correctly.
> 
> This commit does a lot of stuff. I'd split it into library and test parts.
> 

Good point. I'll split this into two commits.
Will send v2 with the split.

>>
>> Verify waitpid() return value to ensure the process was successfully
>> reaped before marking as done. Verify second workload started before
>> cleanup and skip session validation as SIGKILL causes incomplete event log.
>>
>> Signed-off-by: Piotr Rudnicki <piotr.rudnicki@intel.com>
>> ---
>>   lib/xe/xe_eudebug.c             | 16 ++++++++-----
>>   tests/intel/xe_eudebug_online.c | 40 +++++++++++++++++++++++++++-----
>> -
>>   2 files changed, 44 insertions(+), 12 deletions(-)
>>
>> diff --git a/lib/xe/xe_eudebug.c b/lib/xe/xe_eudebug.c
>> index 2abee7ba2..9f981d226 100644
>> --- a/lib/xe/xe_eudebug.c
>> +++ b/lib/xe/xe_eudebug.c
>> @@ -304,16 +304,22 @@ static int safe_pipe_read(int pipe[2], void *buf, int
>> nbytes, int timeout_ms)
>>
>>   static void pipe_signal(int pipe[2], uint64_t token)
>>   {
>> -	igt_assert(write(pipe[1], &token, sizeof(token)) == sizeof(token));
>> +	/* Skip signaling if pipe is closed (e.g., after SIGKILL) */
> 
> Are you sure that closing the pipe, especially with SIGKILL, changes the pipe value to negative?
> 

You're right.

SIGKILL terminates the child process and the OS closes its file 
descriptors, but the parent process's pipe array values remain unchanged.
The pipe[1] >= 0 check protects against pipes explicitly closed by 
pipe_close() (which sets them to -1), not pipes closed due to SIGKILL.

I will update the comment.

>> +	if (pipe[1] >= 0)
>> +		igt_assert(write(pipe[1], &token, sizeof(token)) ==
>> sizeof(token));
>>   }
>>
>>   static void pipe_close(int pipe[2])
>>   {
>> -	if (pipe[0] != -1)
>> -		close(pipe[0]);
>> +	if (pipe[0] >= 0) {
>> +		igt_assert_eq(close(pipe[0]), 0);
>> +		pipe[0] = -1;
>> +	}
>>
>> -	if (pipe[1] != -1)
>> -		close(pipe[1]);
>> +	if (pipe[1] >= 0) {
>> +		igt_assert_eq(close(pipe[1]), 0);
>> +		pipe[1] = -1;
>> +	}
>>   }
>>
>>   #define DEAD_CLIENT 0xccccdead
>> diff --git a/tests/intel/xe_eudebug_online.c b/tests/intel/xe_eudebug_online.c
>> index ff6c5ff19..1093df133 100644
>> --- a/tests/intel/xe_eudebug_online.c
>> +++ b/tests/intel/xe_eudebug_online.c
>> @@ -21,6 +21,7 @@
>>   #include "intel_pat.h"
>>   #include "intel_mocs.h"
>>   #include "gpgpu_shader.h"
>> +#include <sys/wait.h>
>>
>>   #define SHADER_NOP			(0 << 0)
>>   #define SHADER_BREAKPOINT		(1 << 0)
>> @@ -2014,7 +2015,7 @@ static void test_interrupt_other(int fd, struct
>> drm_xe_engine_class_instance *hw
>>   	xe_eudebug_debugger_start_worker(s->debugger);
>>   	xe_eudebug_client_start(s->client);
>>
>> -	/* wait for workload to start */
>> +	/* wait for the first client workload to start */
>>   	igt_for_milliseconds(STARTUP_TIMEOUT_MS) {
>>   		if (READ_ONCE(data->vm_fd) == -1 || READ_ONCE(data-
>>> target_size) == 0)
>>   			continue;
>> @@ -2034,6 +2035,7 @@ static void test_interrupt_other(int fd, struct
>> drm_xe_engine_class_instance *hw
>>   	igt_assert_eq(xe_eudebug_debugger_attach(s->debugger, debugee),
>> 0);
>>   	xe_eudebug_client_start(debugee);
>>
>> +	/* wait for the second client */
>>   	igt_for_milliseconds(STARTUP_TIMEOUT_MS) {
>>   		if (READ_ONCE(debugee_data->vm_fd) == -1 ||
>> READ_ONCE(debugee_data->target_size) == 0)
>>   			continue;
>> @@ -2044,8 +2046,8 @@ static void test_interrupt_other(int fd, struct
>> drm_xe_engine_class_instance *hw
>>   	igt_assert(debugee_data->exec_queue_handle != -1);
>>
>>   	/*
>> -	 * Interrupting the other client should return invalid state
>> -	 * as it is running in runalone mode
>> +	 * Interrupting the first client (currently running)
>> +	 * should return invalid state as it is running in runalone mode
>>   	 */
>>   	igt_assert_eq(__eu_ctl(s->debugger->fd, debugee_data-
>>> client_handle,
>>   		      debugee_data->exec_queue_handle, debugee_data-
>>> lrc_handle, NULL, 0,
>> @@ -2054,15 +2056,39 @@ static void test_interrupt_other(int fd, struct
>> drm_xe_engine_class_instance *hw
>>
>>   	xe_force_gt_reset_async(s->debugger->master_fd, debugee_data-
>>> hwe.gt_id);
>>
>> -	xe_eudebug_client_wait_done(debugee);
>> +	/* First client should complete because of the reset */
>> +	xe_eudebug_client_wait_done(s->client);
>> +
>> +	/* Check if second workload was started and is running */
> 
> Why not use the wait_for_workload_start?
> 
>> +	igt_for_milliseconds(STARTUP_TIMEOUT_MS) {
>> +		if (READ_ONCE(debugee_data->vm_fd) == -1 ||
>> +		    READ_ONCE(debugee_data->target_size) == 0)
>> +			continue;
>> +
>> +		if (pread(debugee_data->vm_fd, &val, sizeof(val),
>> +			  debugee_data->target_offset) == sizeof(val))
>> +			if (val != 0)
>> +				break;
>> +	}
>> +	igt_assert_f(val != 0, "Debugee workload execution is not yet
>> started\n");
>> +
>> +	/* Terminate debugee and mark it as cleaned up */
>> +	igt_assert_eq(kill(debugee->pid, 0), 0);
>> +	igt_assert_eq(debugee->done, 0);
> 
> What are these two asserts doing for the test case?
> 

These are sanity checks before terminating the debugee:
  igt_assert_eq(kill(debugee->pid, 0), 0)
  -> Verifies the process is still alive (signal 0 checks existence 
without killing)
  igt_assert_eq(debugee->done, 0)
  -> Verifies the debugee hasn't already been cleaned up

Both ensure we're in the expected state before SIGKILL.

They could be removed if you consider them redundant - the SIGKILL and 
waitpid would fail anyway if the process is already gone.

> Thanks,
> Dominik Karol
> 
>> +	kill(debugee->pid, SIGKILL);
>> +	igt_assert_eq(waitpid(debugee->pid, NULL, 0), debugee->pid);
>> +	debugee->done = 1;
>> +	debugee->pid = 0;
>> +
>>   	xe_eudebug_debugger_stop_worker(s->debugger);
>>
>>   	xe_eudebug_event_log_print(s->debugger->log, true);
>>   	xe_eudebug_event_log_print(debugee->log, true);
>>
>> -	xe_eudebug_session_check(s, true,
>> XE_EUDEBUG_FILTER_EVENT_VM_BIND |
>> -				 XE_EUDEBUG_FILTER_EVENT_VM_BIND_OP
>> |
>> -
>> XE_EUDEBUG_FILTER_EVENT_VM_BIND_UFENCE);
>> +	/*
>> +	 * Skip xe_eudebug_session_check() because we forcibly killed the
>> debugee
>> +	 * with SIGKILL, so its event log is incomplete and validation would fail.
>> +	 */
>>
>>   	xe_eudebug_client_destroy(debugee);
>>   	xe_eudebug_session_destroy(s);
>> --
>> 2.43.0
> 


  reply	other threads:[~2026-01-28 10:09 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-23 11:35 [PATCH i-g-t 0/1] xe_eudebug_online: Fix test_interrupt_other wait logic Piotr Rudnicki
2026-01-23 11:35 ` [PATCH i-g-t 1/1] tests/intel/xe_eudebug_online: test_interrupt_other wait fix Piotr Rudnicki
2026-01-27 13:42   ` Piatkowski, Dominik Karol
2026-01-28 10:09     ` Rudnicki, Piotr [this message]
2026-01-26  6:13 ` ✓ Xe.CI.BAT: success for xe_eudebug_online: Fix test_interrupt_other wait logic (rev2) Patchwork
2026-01-26  6:27 ` ✗ i915.CI.BAT: failure " Patchwork
2026-01-26  7:19 ` ✗ Xe.CI.Full: " Patchwork
2026-01-26 10:14   ` Rudnicki, Piotr
2026-01-26 10:56 ` ✗ Xe.CI.BAT: " Patchwork
2026-01-26 12:06 ` ✗ Xe.CI.Full: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b13185e5-340d-4947-b420-a24658abb39a@intel.com \
    --to=piotr.rudnicki@intel.com \
    --cc=dominik.karol.piatkowski@intel.com \
    --cc=igt-dev@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox