From: Felix Kuehling <felix.kuehling@amd.com>
To: Jonathan Kim <jonathan.kim@amd.com>,
amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 30/32] drm/amdkfd: add debug queue snapshot operation
Date: Wed, 22 Mar 2023 17:52:07 -0400 [thread overview]
Message-ID: <4c403c2d-5bfc-3f8a-b5f4-77ada7b4ea20@amd.com> (raw)
In-Reply-To: <20230125195401.4183544-31-jonathan.kim@amd.com>
Am 2023-01-25 um 14:53 schrieb Jonathan Kim:
> Allow the debugger to get a snapshot of a specified number of queues
> containing various queue property information that is copied to the
> debugger.
>
> Since the debugger doesn't know how many queues exist at any given time,
> allow the debugger to pass the requested number of snapshots as 0 to get
> the actual number of potential snapshots to use for a subsequent snapshot
> request for actual information.
>
> To prevent future ABI breakage, pass in the requested entry_size.
> The KFD will return it's own entry_size in case the debugger still wants
> log the information in a core dump on sizing failure.
>
> Also allow the debugger to clear exceptions when doing a snapshot.
>
> v3: fix uninitialized return and change queue snapshot to type void for
> proper increment on buffer copy.
> use memset 0 to init snapshot entry to clear struct padding.
>
> v2: change buf_size arg to num_queues for clarity.
> fix minimum entry size calculation.
>
> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++
> .../drm/amd/amdkfd/kfd_device_queue_manager.c | 36 ++++++++++++++++
> .../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 ++
> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 5 +++
> .../amd/amdkfd/kfd_process_queue_manager.c | 41 +++++++++++++++++++
> 5 files changed, 91 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index d3d2026b6e65..93b288233577 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -2965,6 +2965,12 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, struct kfd_process *p, v
> &args->query_exception_info.info_size);
> break;
> case KFD_IOC_DBG_TRAP_GET_QUEUE_SNAPSHOT:
> + r = pqm_get_queue_snapshot(&target->pqm,
> + args->queue_snapshot.exception_mask,
> + (void __user *)args->queue_snapshot.snapshot_buf_ptr,
> + &args->queue_snapshot.num_queues,
> + &args->queue_snapshot.entry_size);
> + break;
> case KFD_IOC_DBG_TRAP_GET_DEVICE_SNAPSHOT:
> pr_warn("Debug op %i not supported yet\n", args->op);
> r = -EACCES;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 7792fe9491c5..5ae504a512f0 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -3000,6 +3000,42 @@ int suspend_queues(struct kfd_process *p,
> return total_suspended;
> }
>
> +static uint32_t set_queue_type_for_user(struct queue_properties *q_props)
> +{
> + switch (q_props->type) {
> + case KFD_QUEUE_TYPE_COMPUTE:
> + return q_props->format == KFD_QUEUE_FORMAT_PM4
> + ? KFD_IOC_QUEUE_TYPE_COMPUTE
> + : KFD_IOC_QUEUE_TYPE_COMPUTE_AQL;
> + case KFD_QUEUE_TYPE_SDMA:
> + return KFD_IOC_QUEUE_TYPE_SDMA;
> + case KFD_QUEUE_TYPE_SDMA_XGMI:
> + return KFD_IOC_QUEUE_TYPE_SDMA_XGMI;
> + default:
> + WARN_ONCE(true, "queue type not recognized!");
> + return 0xffffffff;
> + };
> +}
> +
> +void set_queue_snapshot_entry(struct queue *q,
> + uint64_t exception_clear_mask,
> + struct kfd_queue_snapshot_entry *qss_entry)
> +{
> + qss_entry->ring_base_address = q->properties.queue_address;
> + qss_entry->write_pointer_address = (uint64_t)q->properties.write_ptr;
> + qss_entry->read_pointer_address = (uint64_t)q->properties.read_ptr;
> + qss_entry->ctx_save_restore_address =
> + q->properties.ctx_save_restore_area_address;
> + qss_entry->ctx_save_restore_area_size =
> + q->properties.ctx_save_restore_area_size;
> + qss_entry->exception_status = q->properties.exception_status;
> + qss_entry->queue_id = q->properties.queue_id;
> + qss_entry->gpu_id = q->device->id;
> + qss_entry->ring_size = (uint32_t)q->properties.queue_size;
> + qss_entry->queue_type = set_queue_type_for_user(&q->properties);
> + q->properties.exception_status &= ~exception_clear_mask;
> +}
> +
> int debug_lock_and_unmap(struct device_queue_manager *dqm)
> {
> int r;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> index 7ccf8d0d1867..89d4a5b293a5 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> @@ -296,6 +296,9 @@ int suspend_queues(struct kfd_process *p,
> int resume_queues(struct kfd_process *p,
> uint32_t num_queues,
> uint32_t *usr_queue_id_array);
> +void set_queue_snapshot_entry(struct queue *q,
> + uint64_t exception_clear_mask,
> + struct kfd_queue_snapshot_entry *qss_entry);
> int debug_lock_and_unmap(struct device_queue_manager *dqm);
> int debug_map_and_unlock(struct device_queue_manager *dqm);
> int debug_refresh_runlist(struct device_queue_manager *dqm);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index cfc50d1690c7..cc7816db60eb 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -1302,6 +1302,11 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
> void __user *ctl_stack,
> u32 *ctl_stack_used_size,
> u32 *save_area_used_size);
> +int pqm_get_queue_snapshot(struct process_queue_manager *pqm,
> + uint64_t exception_clear_mask,
> + void __user *buf,
> + int *num_qss_entries,
> + uint32_t *entry_size);
>
> int amdkfd_fence_wait_timeout(uint64_t *fence_addr,
> uint64_t fence_value,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index 0ae6026c7d69..221cd4b03f1c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -576,6 +576,47 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
> save_area_used_size);
> }
>
> +int pqm_get_queue_snapshot(struct process_queue_manager *pqm,
> + uint64_t exception_clear_mask,
> + void __user *buf,
> + int *num_qss_entries,
> + uint32_t *entry_size)
> +{
> + struct process_queue_node *pqn;
> + uint32_t tmp_entry_size = *entry_size, tmp_qss_entries = *num_qss_entries;
> + int r = 0;
> +
> + *num_qss_entries = 0;
> + if (!(*entry_size))
> + return -EINVAL;
> +
> + *entry_size = min_t(size_t, *entry_size, sizeof(struct kfd_queue_snapshot_entry));
> + mutex_lock(&pqm->process->event_mutex);
> +
> + list_for_each_entry(pqn, &pqm->queues, process_queue_list) {
> + if (!pqn->q)
> + continue;
> +
> + if (*num_qss_entries < tmp_qss_entries) {
> + struct kfd_queue_snapshot_entry src;
> +
> + memset(&src, 0, sizeof(src));
I'd move the variable declaration up to the function scope. That way you
only need to memset it once outside the loop. With that fixed, the patch is
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
> +
> + set_queue_snapshot_entry(pqn->q, exception_clear_mask, &src);
> +
> + if (copy_to_user(buf, &src, *entry_size)) {
> + r = -EFAULT;
> + break;
> + }
> + buf += tmp_entry_size;
> + }
> + *num_qss_entries += 1;
> + }
> +
> + mutex_unlock(&pqm->process->event_mutex);
> + return r;
> +}
> +
> static int get_queue_data_sizes(struct kfd_process_device *pdd,
> struct queue *q,
> uint32_t *mqd_size,
next prev parent reply other threads:[~2023-03-22 21:52 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-25 19:53 [PATCH 00/32] Upstream of kernel support for AMDGPU ISA debugging Jonathan Kim
2023-01-25 19:53 ` [PATCH 01/32] drm/amdkfd: add debug and runtime enable interface Jonathan Kim
2023-02-16 22:16 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 02/32] drm/amdkfd: display debug capabilities Jonathan Kim
2023-02-16 22:24 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 03/32] drm/amdkfd: prepare per-process debug enable and disable Jonathan Kim
2023-02-16 23:44 ` Felix Kuehling
2023-03-23 19:12 ` Kim, Jonathan
2023-03-23 20:08 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 04/32] drm/amdgpu: add kgd hw debug mode setting interface Jonathan Kim
2023-01-25 19:53 ` [PATCH 05/32] drm/amdgpu: setup hw debug registers on driver initialization Jonathan Kim
2023-02-16 22:39 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 06/32] drm/amdgpu: add gfx9 hw debug mode enable and disable calls Jonathan Kim
2023-01-29 5:12 ` kernel test robot
2023-02-16 22:54 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 07/32] drm/amdgpu: add gfx9.4.1 " Jonathan Kim
2023-01-29 6:34 ` kernel test robot
2023-02-16 23:01 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 08/32] drm/amdgpu: add gfx10 " Jonathan Kim
2023-01-29 7:55 ` kernel test robot
2023-02-16 23:11 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 09/32] drm/amdgpu: add gfx9.4.2 " Jonathan Kim
2023-02-16 23:14 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 10/32] drm/amdgpu: add gfx11 " Jonathan Kim
2023-02-16 23:19 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 11/32] drm/amdgpu: add configurable grace period for unmap queues Jonathan Kim
2023-03-20 19:19 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 12/32] drm/amdkfd: prepare map process for single process debug devices Jonathan Kim
2023-03-20 20:06 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 13/32] drm/amdgpu: prepare map process for multi-process " Jonathan Kim
2023-03-20 20:16 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 14/32] drm/amdgpu: expose debug api for mes Jonathan Kim
2023-03-20 20:47 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 15/32] drm/amdkfd: prepare trap workaround for gfx11 Jonathan Kim
2023-03-20 21:49 ` Felix Kuehling
2023-03-23 13:50 ` Kim, Jonathan
2023-03-23 14:00 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 16/32] drm/amdkfd: add per process hw trap enable and disable functions Jonathan Kim
2023-03-20 23:06 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 17/32] drm/amdkfd: add raise exception event function Jonathan Kim
2023-03-20 23:18 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 18/32] drm/amdkfd: add send exception operation Jonathan Kim
2023-03-20 23:26 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 19/32] drm/amdkfd: add runtime enable operation Jonathan Kim
2023-03-21 0:31 ` Felix Kuehling
2023-03-23 19:45 ` Kim, Jonathan
2023-01-25 19:53 ` [PATCH 20/32] drm/amdkfd: add debug trap enabled flag to tma Jonathan Kim
2023-01-25 19:53 ` [PATCH 21/32] drm/amdkfd: update process interrupt handling for debug events Jonathan Kim
2023-03-21 21:07 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 22/32] drm/amdkfd: add debug set exceptions enabled operation Jonathan Kim
2023-01-25 19:53 ` [PATCH 23/32] drm/amdkfd: add debug wave launch override operation Jonathan Kim
2023-03-21 21:37 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 24/32] drm/amdkfd: add debug wave launch mode operation Jonathan Kim
2023-03-21 21:42 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 25/32] drm/amdkfd: add debug suspend and resume process queues operation Jonathan Kim
2023-03-21 22:16 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 26/32] drm/amdkfd: add debug set and clear address watch points operation Jonathan Kim
2023-03-22 21:38 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 27/32] drm/amdkfd: add debug set flags operation Jonathan Kim
2023-03-22 21:47 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 28/32] drm/amdkfd: add debug query event operation Jonathan Kim
2023-01-25 19:53 ` [PATCH 29/32] drm/amdkfd: add debug query exception info operation Jonathan Kim
2023-01-25 19:53 ` [PATCH 30/32] drm/amdkfd: add debug queue snapshot operation Jonathan Kim
2023-03-22 21:52 ` Felix Kuehling [this message]
2023-01-25 19:54 ` [PATCH 31/32] drm/amdkfd: add debug device " Jonathan Kim
2023-03-22 21:54 ` Felix Kuehling
2023-01-25 19:54 ` [PATCH 32/32] drm/amdkfd: bump kfd ioctl minor version for debug api availability Jonathan Kim
2023-03-22 21:56 ` Felix Kuehling
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4c403c2d-5bfc-3f8a-b5f4-77ada7b4ea20@amd.com \
--to=felix.kuehling@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=jonathan.kim@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox