Re: [PATCH i-g-t v2 4/4] tests/intel/xe_eudebug_online: Add read/write pagefault online tests

Igt-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
To: "Manszewski, Christoph" <christoph.manszewski@intel.com>,
	<igt-dev@lists.freedesktop.org>
Cc: <andrzej.hajda@intel.com>, <jonathan.cavitt@intel.com>,
	<mika.kuoppala@intel.com>, <dominik.grzegorzek@intel.com>
Subject: Re: [PATCH i-g-t v2 4/4] tests/intel/xe_eudebug_online: Add read/write pagefault online tests
Date: Fri, 22 Nov 2024 16:33:42 +0200	[thread overview]
Message-ID: <296536c1-a5db-4baf-8813-56e67e87dd0b@intel.com> (raw)
In-Reply-To: <f354ed67-11ae-489e-a76f-758a68f52a2a@intel.com>



On 11/22/24 11:55 AM, Manszewski, Christoph wrote:
> Hi Gwan-gyeong,
> 
> On 22.11.2024 09:21, Gwan-gyeong Mun wrote:
>>
>>
>> On 11/21/24 7:12 PM, Manszewski, Christoph wrote:
>>> Hi Gwan-gyeong,
>>>
>>> On 21.11.2024 13:22, Gwan-gyeong Mun wrote:
>>>> Add read and write pagefault tests to xe_eudebug_online that checks 
>>>> if a
>>>> pagefault event is submitted by the KMD debugger when a pagefault 
>>>> occurs.
>>>>
>>>> Test that read (load instruction) and write(store instruction) 
>>>> attempt to
>>>> load or store access to unallocated memory, causing a pagefault.
>>>> Examine the address causing the page fault and the number of eu threads
>>>> causing the pagefault.
>>>>
>>>> v2: Refactor of output attention bits on pagefault event handling 
>>>> (Andrzej)
>>>>      remove / update redudant code (Andrzej, Christoph)
>>>>      use igt_container_of() macro (Christoph)
>>>>
>>>> Co-developed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
>>>> Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
>>>> ---
>>>>   tests/intel/xe_eudebug_online.c | 178 ++++++++++++++++++++++++++++ 
>>>> +++-
>>>>   1 file changed, 173 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/tests/intel/xe_eudebug_online.c b/tests/intel/ 
>>>> xe_eudebug_online.c
>>>> index 0ef0d8093..a70d18ee4 100644
>>>> --- a/tests/intel/xe_eudebug_online.c
>>>> +++ b/tests/intel/xe_eudebug_online.c
>>>> @@ -36,6 +36,8 @@
>>>>   #define BB_IN_VRAM            (1 << 11)
>>>>   #define TARGET_IN_SRAM            (1 << 12)
>>>>   #define TARGET_IN_VRAM            (1 << 13)
>>>> +#define SHADER_PAGEFAULT_READ        (1 << 14)
>>>> +#define SHADER_PAGEFAULT_WRITE        (1 << 15)
>>>>   #define TRIGGER_UFENCE_SET_BREAKPOINT    (1 << 24)
>>>>   #define TRIGGER_RESUME_SINGLE_WALK    (1 << 25)
>>>>   #define TRIGGER_RESUME_PARALLEL_WALK    (1 << 26)
>>>> @@ -45,6 +47,7 @@
>>>>   #define TRIGGER_RESUME_DSS        (1 << 30)
>>>>   #define TRIGGER_RESUME_ONE        (1 << 31)
>>>> +#define SHADER_PAGEFAULT    (SHADER_PAGEFAULT_READ | 
>>>> SHADER_PAGEFAULT_WRITE)
>>>>   #define BB_REGION_BITMASK    (BB_IN_SRAM | BB_IN_VRAM)
>>>>   #define TARGET_REGION_BITMASK    (TARGET_IN_SRAM | TARGET_IN_VRAM)
>>>> @@ -61,6 +64,8 @@
>>>>   #define CACHING_VALUE(n)    (CACHING_INIT_VALUE + (n))
>>>>   #define SHADER_CANARY 0x01010101
>>>> +#define BAD_CANARY 0xf1f1f1f
>>>> +#define BAD_OFFSET (0x12345678ull << 12)
>>>>   #define WALKER_X_DIM        4
>>>>   #define WALKER_ALIGNMENT    16
>>>> @@ -120,7 +125,7 @@ static struct intel_buf *create_uc_buf(int fd, 
>>>> int width, int height, uint64_t r
>>>>   static int get_number_of_threads(uint64_t flags)
>>>>   {
>>>> -    if (flags & SHADER_MIN_THREADS)
>>>> +    if (flags & (SHADER_MIN_THREADS | SHADER_PAGEFAULT))
>>>>           return 16;
>>>>       if (flags & (TRIGGER_RESUME_ONE | TRIGGER_RESUME_SINGLE_WALK |
>>>> @@ -179,6 +184,16 @@ static struct gpgpu_shader *get_shader(int fd, 
>>>> const unsigned int flags)
>>>>               gpgpu_shader__common_target_write_u32(shader, s_dim.y 
>>>> + i, CACHING_VALUE(i));
>>>>           gpgpu_shader__nop(shader);
>>>>           gpgpu_shader__breakpoint(shader);
>>>> +    } else if (flags & SHADER_PAGEFAULT) {
>>>> +        if (flags & SHADER_PAGEFAULT_READ)
>>>> +            gpgpu_shader__read_a64_dword(shader, BAD_OFFSET);
>>>> +        else
>>>> +            gpgpu_shader__write_a64_dword(shader, BAD_OFFSET, 
>>>> BAD_CANARY);
>>>> +
>>>> +        gpgpu_shader__label(shader, 0);
>>>> +        gpgpu_shader__write_dword(shader, SHADER_CANARY, 0);
>>>> +        gpgpu_shader__jump_neq(shader, 0, w_dim.y, STEERING_END_LOOP);
>>>> +        gpgpu_shader__write_dword(shader, SHADER_CANARY, 0);
>>>
>>> Now that I think about - do we need this to be a loop? Can't we just 
>>> do the read/write instructions? This would simplify the code and I 
>>> don't yet see why we need to loop within the shader. The SHADER_LOOP 
>>> is used for interrupt-all because we want to interrupt the workload 
>>> from the user/main igt thread. But here, similar to the basic- 
>>> breakpoint test, we just submit a workload that will halt because of 
>>> the hardware/kmd intervention.
>>>
>> the pagefault tests also need this concept.
>>
>> When a pagefault happened, KMD sets “Force Exception / Force External 
>> Halt” in TD_CTL to cause the eu threads to enter SIP mode.
>> In the pagefault handling process of eudebug, kmd installs a null page 
>> at the address where the pagefault happened and makes the halted eu 
>> threads resume (make unhalt).
>>
>> It would be ideal if all unhalted eu threads immediately entered SIP 
>> mode due to the FE/FEH settings, but it may not happen immediately.
>> Therefore, the purpose of using a loop is to ensure that the kernel 
>> shader does not terminate until a pagefault event and attention event 
>> occur by adding an additional instruction after the instruction that 
>> causes the page fault.
>> Therefore, a loop is used to ensure that at least one eu thread must 
>> enter SIP mode.
> 
> Yeah if the count of processed instructions before the exception is not 
> defined then indeed the loop has it's place here. But we still may 
> reduce a little bit of code, see below.
> 
>> The attention callback sets to exit this loop, so this code allows the 
>> eu thread to terminate after the sip shader is processed.
>>
>> Br,
>> G.G.
>>>>       }
>>>>       gpgpu_shader__eot(shader);
>>>> @@ -217,6 +232,16 @@ static int count_set_bits(void *ptr, size_t size)
>>>>       return count;
>>>>   }
>>>> +static int eu_attentions_xor_count(const uint32_t *a, const 
>>>> uint32_t *b, uint32_t size)
>>>> +{
>>>> +    int count = 0;
>>>> +
>>>> +    for (int i = 0; i < size / 4 ; i++)
>>>> +        count += igt_hweight(a[i] ^ b[i]);
>>>> +
>>>> +    return count;
>>>> +}
>>>> +
>>>>   static int count_canaries_eq(uint32_t *ptr, struct dim_t w_dim, 
>>>> uint32_t value)
>>>>   {
>>>>       int count = 0;
>>>> @@ -636,7 +661,7 @@ static void eu_attention_resume_trigger(struct 
>>>> xe_eudebug_debugger *d,
>>>>           }
>>>>       }
>>>> -    if (d->flags & SHADER_LOOP) {
>>>> +    if (d->flags & (SHADER_LOOP | SHADER_PAGEFAULT)) {
>>>
>>> If we drop the loop we can drop also this.
>>>
>>>>           uint32_t threads = get_number_of_threads(d->flags);
>>>>           uint32_t val = STEERING_END_LOOP;
>>>> @@ -746,6 +771,44 @@ static void 
>>>> eu_attention_resume_single_step_trigger(struct xe_eudebug_debugger *
>>>>               data->single_step_bitmask[i] &= ~att->bitmask[i];
>>>>   }
>>>> +static void eu_attention_resume_pagefault_trigger(struct 
>>>> xe_eudebug_debugger *d,
>>>> +                          struct drm_xe_eudebug_event *e)
>>>> +{
>>>> +    struct drm_xe_eudebug_event_eu_attention *att = 
>>>> igt_container_of(e, att, base);
>>>> +    struct online_debug_data *data = d->ptr;
>>>> +    uint32_t bitmask_size = att->bitmask_size;
>>>> +    uint8_t *bitmask;
>>>> +
>>>> +    if (data->last_eu_control_seqno > att->base.seqno)
>>>> +        return;
>>>> +
>>>> +    bitmask = calloc(1, att->bitmask_size);
>>>> +    igt_assert(bitmask);
>>>> +
>>>> +    eu_ctl_stopped(d->fd, att->client_handle, att->exec_queue_handle,
>>>> +               att->lrc_handle, bitmask, &bitmask_size);
>>>> +    igt_assert(bitmask_size == att->bitmask_size);
>>>> +
>>>> +    pthread_mutex_lock(&data->mutex);
>>>> +
>>>> +    if (d->flags & SHADER_PAGEFAULT) {
>>>> +        uint32_t threads = get_number_of_threads(d->flags);
>>>> +        uint32_t val = STEERING_END_LOOP;
>>>> +
>>>> +        igt_assert_eq(pwrite(data->vm_fd, &val, sizeof(uint32_t),
>>>> +                     data->target_offset + steering_offset(threads)),
>>>> +                  sizeof(uint32_t));
>>>> +        fsync(data->vm_fd);
>>>> +    }
>>>
>>> We can also drop this when we remove the loop. Btw. why can't we just 
>>> use 'eu_attention_resume_trigger' instead of this whole function?
> 
> We could remove the 'eu_attention_resume_trigger' like so:
> 
> ```
> diff --git a/tests/intel/xe_eudebug_online.c b/tests/intel/ 
> xe_eudebug_online.c
> index a70d18ee4..c077795ee 100644
> --- a/tests/intel/xe_eudebug_online.c
> +++ b/tests/intel/xe_eudebug_online.c
> @@ -622,7 +622,10 @@ static void eu_attention_resume_trigger(struct 
> xe_eudebug_debugger *d,
>       eu_ctl_stopped(d->fd, att->client_handle, att->exec_queue_handle,
>                  att->lrc_handle, bitmask, &bitmask_size);
>       igt_assert(bitmask_size == att->bitmask_size);
> -    igt_assert(memcmp(bitmask, att->bitmask, att->bitmask_size) == 0);
> +
> +    /* No guarantee that all pagefaulting eu threads will raise 
> attention */
> +    if (!(d->flags & SHADER_PAGEFAULT))
> +        igt_assert(memcmp(bitmask, att->bitmask, att->bitmask_size) == 0);
> 
>       pthread_mutex_lock(&data->mutex);
>       if (igt_nsec_elapsed(&data->exception_arrived) < 
> (MAX_PREEMPT_TIMEOUT + 1) * NSEC_PER_SEC &&
> @@ -771,44 +774,6 @@ static void 
> eu_attention_resume_single_step_trigger(struct xe_eudebug_debugger *
>               data->single_step_bitmask[i] &= ~att->bitmask[i];
>   }
> 
> -static void eu_attention_resume_pagefault_trigger(struct 
> xe_eudebug_debugger *d,
> -                          struct drm_xe_eudebug_event *e)
> -{
> -    struct drm_xe_eudebug_event_eu_attention *att = igt_container_of(e, 
> att, base);
> -    struct online_debug_data *data = d->ptr;
> -    uint32_t bitmask_size = att->bitmask_size;
> -    uint8_t *bitmask;
> -
> -    if (data->last_eu_control_seqno > att->base.seqno)
> -        return;
> -
> -    bitmask = calloc(1, att->bitmask_size);
> -    igt_assert(bitmask);
> -
> -    eu_ctl_stopped(d->fd, att->client_handle, att->exec_queue_handle,
> -               att->lrc_handle, bitmask, &bitmask_size);
> -    igt_assert(bitmask_size == att->bitmask_size);
> -
> -    pthread_mutex_lock(&data->mutex);
> -
> -    if (d->flags & SHADER_PAGEFAULT) {
> -        uint32_t threads = get_number_of_threads(d->flags);
> -        uint32_t val = STEERING_END_LOOP;
> -
> -        igt_assert_eq(pwrite(data->vm_fd, &val, sizeof(uint32_t),
> -                     data->target_offset + steering_offset(threads)),
> -                  sizeof(uint32_t));
> -        fsync(data->vm_fd);
> -    }
> -    pthread_mutex_unlock(&data->mutex);
> -
> -    data->last_eu_control_seqno = eu_ctl_resume(d->master_fd, d->fd, 
> att->client_handle,
> -                            att->exec_queue_handle, att->lrc_handle,
> -                            bitmask, att->bitmask_size);
> -
> -    free(bitmask);
> -}
> -
>   static void open_trigger(struct xe_eudebug_debugger *d,
>                struct drm_xe_eudebug_event *e)
>   {
> @@ -1530,7 +1495,7 @@ static void test_pagefault_online(int fd, struct 
> drm_xe_engine_class_instance *h
>       xe_eudebug_debugger_add_trigger(s->debugger, 
> DRM_XE_EUDEBUG_EVENT_EU_ATTENTION,
>                       eu_attention_debug_trigger);
>       xe_eudebug_debugger_add_trigger(s->debugger, 
> DRM_XE_EUDEBUG_EVENT_EU_ATTENTION,
> -                    eu_attention_resume_pagefault_trigger);
> +                    eu_attention_resume_trigger);
>       xe_eudebug_debugger_add_trigger(s->debugger, 
> DRM_XE_EUDEBUG_EVENT_VM, vm_open_trigger);
>       xe_eudebug_debugger_add_trigger(s->debugger, 
> DRM_XE_EUDEBUG_EVENT_METADATA,
>                       create_metadata_trigger);
> ```
> 
> Does this look reasonable? I know it adds yet another path to 
> 'eu_attention_resume_trigger' but you partially account for the 
> pagefault shader in your current code anyway.
> 
Yes, right, I added a separate callback (for readability) because I 
didn't need the non-pagefault scenario checking routine of the attention 
callback.
In order to keep the code additions in this patch as small as possible, 
I'll integrate to handle pagefault scenario in 
eu_attention_resume_trigger() callback, as you suggested.

Thanks,
G.G.

> Thanks,
> Christoph
>

next prev parent reply	other threads:[~2024-11-22 14:34 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-21 12:22 [PATCH i-g-t v2 0/4] tests/intel/xe_eudebug_online: Introduce read/write pagefault tests Gwan-gyeong Mun
2024-11-21 12:22 ` [PATCH i-g-t v2 1/4] lib/gppgu_shader: Add write D32 to ppgtt virtual address Gwan-gyeong Mun
2024-11-21 16:08   ` Hajda, Andrzej
2024-11-22  7:51     ` Gwan-gyeong Mun
2024-11-22 10:47       ` Hajda, Andrzej
2024-11-22 14:28         ` Gwan-gyeong Mun
2024-11-21 12:22 ` [PATCH i-g-t v2 2/4] lib/gppgu_shader: Add read D32 from " Gwan-gyeong Mun
2024-11-21 16:14   ` Hajda, Andrzej
2024-11-22  7:54     ` Gwan-gyeong Mun
2024-11-21 12:22 ` [PATCH i-g-t v2 3/4] eudebug: Add eudebug pagefault event declarations Gwan-gyeong Mun
2024-11-21 12:22 ` [PATCH i-g-t v2 4/4] tests/intel/xe_eudebug_online: Add read/write pagefault online tests Gwan-gyeong Mun
2024-11-21 16:17   ` Hajda, Andrzej
2024-11-21 17:12   ` Manszewski, Christoph
2024-11-22  8:21     ` Gwan-gyeong Mun
2024-11-22  9:55       ` Manszewski, Christoph
2024-11-22 14:33         ` Gwan-gyeong Mun [this message]
2024-11-21 14:36 ` ✓ Xe.CI.BAT: success for tests/intel/xe_eudebug_online: Introduce read/write pagefault tests (rev2) Patchwork
2024-11-21 14:51 ` ✗ i915.CI.BAT: failure " Patchwork
2024-11-21 21:16 ` ✗ Xe.CI.Full: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=296536c1-a5db-4baf-8813-56e67e87dd0b@intel.com \
    --to=gwan-gyeong.mun@intel.com \
    --cc=andrzej.hajda@intel.com \
    --cc=christoph.manszewski@intel.com \
    --cc=dominik.grzegorzek@intel.com \
    --cc=igt-dev@lists.freedesktop.org \
    --cc=jonathan.cavitt@intel.com \
    --cc=mika.kuoppala@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox