AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Felix Kuehling <felix.kuehling@amd.com>
To: Jonathan Kim <jonathan.kim@amd.com>, amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 27/29] drm/amdkfd: add debug queue snapshot operation
Date: Wed, 30 Nov 2022 18:55:07 -0500	[thread overview]
Message-ID: <65d1b014-9d27-e870-8f2b-fed9d5f02729@amd.com> (raw)
In-Reply-To: <20221031162359.445805-27-jonathan.kim@amd.com>


On 2022-10-31 12:23, Jonathan Kim wrote:
> Allow the debugger to get a snapshot of a specified number of queues
> containing various queue property information that is copied to the
> debugger.
>
> Since the debugger doesn't know how many queues exist at any given time,
> allow the debugger to pass the requested number of snapshots as 0 to get
> the actual number of potential snapshots to use for a subsequent snapshot
> request for actual information.
>
> To prevent future ABI breakage, pass in the requested entry_size.
> The KFD will return it's own entry_size in case the debugger still wants
> log the information in a core dump on sizing failure.
>
> Also allow the debugger to clear exceptions when doing a snapshot.
>
> v2: change buf_size arg to num_queues for clarity.
> fix minimum entry size calculation.
>
> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>

Two nit-picks inline.


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |  6 +++
>   .../drm/amd/amdkfd/kfd_device_queue_manager.c | 41 +++++++++++++++++++
>   .../drm/amd/amdkfd/kfd_device_queue_manager.h |  4 ++
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  5 +++
>   .../amd/amdkfd/kfd_process_queue_manager.c    | 40 ++++++++++++++++++
>   5 files changed, 96 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 2c8f107237ee..cea393350980 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -2961,6 +2961,12 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, struct kfd_process *p, v
>   				&args->query_exception_info.info_size);
>   		break;
>   	case KFD_IOC_DBG_TRAP_GET_QUEUE_SNAPSHOT:
> +		r = pqm_get_queue_snapshot(&target->pqm,
> +				args->queue_snapshot.exception_mask,
> +				(void __user *)args->queue_snapshot.snapshot_buf_ptr,
> +				&args->queue_snapshot.num_queues,
> +				&args->queue_snapshot.entry_size);
> +		break;
>   	case KFD_IOC_DBG_TRAP_GET_DEVICE_SNAPSHOT:
>   		pr_warn("Debug op %i not supported yet\n", args->op);
>   		r = -EACCES;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 589efbefc8dc..51f8c5676c56 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -2950,6 +2950,47 @@ int suspend_queues(struct kfd_process *p,
>   	return total_suspended;
>   }
>   
> +static uint32_t set_queue_type_for_user(struct queue_properties *q_props)
> +{
> +	switch (q_props->type) {
> +	case KFD_QUEUE_TYPE_COMPUTE:
> +		return q_props->format == KFD_QUEUE_FORMAT_PM4
> +					? KFD_IOC_QUEUE_TYPE_COMPUTE
> +					: KFD_IOC_QUEUE_TYPE_COMPUTE_AQL;
> +	case KFD_QUEUE_TYPE_SDMA:
> +		return KFD_IOC_QUEUE_TYPE_SDMA;
> +	case KFD_QUEUE_TYPE_SDMA_XGMI:
> +		return KFD_IOC_QUEUE_TYPE_SDMA_XGMI;
> +	default:
> +		WARN_ONCE(true, "queue type not recognized!");
> +		return 0xffffffff;
> +	};
> +}
> +
> +void set_queue_snapshot_entry(struct device_queue_manager *dqm,
> +			      struct queue *q,
> +			      uint64_t exception_clear_mask,
> +			      struct kfd_queue_snapshot_entry *qss_entry)

The dqm parameter is not needed. The function can get this from 
q->device->dqm. It's also only needed for dqm locking. I'm not sure 
that's even necessary. Aren't the event_mutex and target process mutex 
held by the caller enough to protect the exception_status and other 
queue properties?


> +{
> +	dqm_lock(dqm);
> +
> +	qss_entry->ring_base_address = q->properties.queue_address;
> +	qss_entry->write_pointer_address = (uint64_t)q->properties.write_ptr;
> +	qss_entry->read_pointer_address = (uint64_t)q->properties.read_ptr;
> +	qss_entry->ctx_save_restore_address =
> +				q->properties.ctx_save_restore_area_address;
> +	qss_entry->ctx_save_restore_area_size =
> +				q->properties.ctx_save_restore_area_size;
> +	qss_entry->exception_status = q->properties.exception_status;
> +	qss_entry->queue_id = q->properties.queue_id;
> +	qss_entry->gpu_id = q->device->id;
> +	qss_entry->ring_size = (uint32_t)q->properties.queue_size;
> +	qss_entry->queue_type = set_queue_type_for_user(&q->properties);
> +	q->properties.exception_status &= ~exception_clear_mask;
> +
> +	dqm_unlock(dqm);
> +}
> +
>   int debug_lock_and_unmap(struct device_queue_manager *dqm)
>   {
>   	int r;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> index 12643528684c..094705b932fc 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> @@ -297,6 +297,10 @@ int resume_queues(struct kfd_process *p,
>   		bool resume_all_queues,
>   		uint32_t num_queues,
>   		uint32_t *usr_queue_id_array);
> +void set_queue_snapshot_entry(struct device_queue_manager *dqm,
> +			      struct queue *q,
> +			      uint64_t exception_clear_mask,
> +			      struct kfd_queue_snapshot_entry *qss_entry);
>   int debug_lock_and_unmap(struct device_queue_manager *dqm);
>   int debug_map_and_unlock(struct device_queue_manager *dqm);
>   int debug_refresh_runlist(struct device_queue_manager *dqm);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index aee4fe20e676..ebd701143981 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -1302,6 +1302,11 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
>   		       void __user *ctl_stack,
>   		       u32 *ctl_stack_used_size,
>   		       u32 *save_area_used_size);
> +int pqm_get_queue_snapshot(struct process_queue_manager *pqm,
> +			   uint64_t exception_clear_mask,
> +			   struct kfd_queue_snapshot_entry __user *buf,
> +			   int *num_qss_entries,
> +			   uint32_t *entry_size);
>   
>   int amdkfd_fence_wait_timeout(uint64_t *fence_addr,
>   			      uint64_t fence_value,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index 15db83c9a585..30df1046c30b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -569,6 +569,46 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
>   						       save_area_used_size);
>   }
>   
> +int pqm_get_queue_snapshot(struct process_queue_manager *pqm,
> +			   uint64_t exception_clear_mask,
> +			   struct kfd_queue_snapshot_entry __user *buf,
> +			   int *num_qss_entries,
> +			   uint32_t *entry_size)
> +{
> +	struct process_queue_node *pqn;
> +	uint32_t tmp_entry_size = *entry_size, tmp_qss_entries = *num_qss_entries;
> +	int r;
> +
> +	*num_qss_entries = 0;
> +	if (!(*entry_size))
> +		return -EINVAL;
> +
> +	*entry_size = min_t(size_t, *entry_size, sizeof(struct kfd_queue_snapshot_entry));
> +	mutex_lock(&pqm->process->event_mutex);
> +
> +	list_for_each_entry(pqn, &pqm->queues, process_queue_list) {
> +		if (!pqn->q)
> +			continue;
> +
> +		if (*num_qss_entries < tmp_qss_entries) {
> +			struct kfd_queue_snapshot_entry src = {0};

It's safer to use memset here. This initialization may not initialize 
padding, so it doesn't guarantee that no uninitialized data leaks from 
kernel mode to user mode.

Regards,
   Felix


> +
> +			set_queue_snapshot_entry(pqn->q->device->dqm,
> +					pqn->q, exception_clear_mask, &src);
> +
> +			if (copy_to_user(buf, &src, *entry_size)) {
> +				r = -EFAULT;
> +				break;
> +			}
> +			buf += tmp_entry_size;
> +		}
> +		*num_qss_entries += 1;
> +	}
> +
> +	mutex_unlock(&pqm->process->event_mutex);
> +	return r;
> +}
> +
>   static int get_queue_data_sizes(struct kfd_process_device *pdd,
>   				struct queue *q,
>   				uint32_t *mqd_size,

  reply	other threads:[~2022-11-30 23:55 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31 16:23 [PATCH 01/29] drm/amdkfd: add debug and runtime enable interface Jonathan Kim
2022-10-31 16:23 ` [PATCH 02/29] drm/amdkfd: display debug capabilities Jonathan Kim
2022-11-22 23:08   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 03/29] drm/amdkfd: prepare per-process debug enable and disable Jonathan Kim
2022-11-22 23:31   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 04/29] drm/amdgpu: add kgd hw debug mode setting interface Jonathan Kim
2022-12-01  0:08   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 05/29] drm/amdgpu: setup hw debug registers on driver initialization Jonathan Kim
2022-11-22 23:38   ` Felix Kuehling
2022-11-23 20:53     ` Kim, Jonathan
2022-12-01  0:18     ` Felix Kuehling
2022-12-01  0:23   ` Felix Kuehling
2022-12-02 17:42     ` Kim, Jonathan
2022-10-31 16:23 ` [PATCH 06/29] drm/amdgpu: add gfx9 hw debug mode enable and disable calls Jonathan Kim
2022-11-22 23:50   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 07/29] drm/amdgpu: add gfx9.4.1 " Jonathan Kim
2022-11-22 23:59   ` Felix Kuehling
2022-11-24 14:58     ` Kim, Jonathan
2022-11-24 16:25       ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 08/29] drm/amdgpu: add gfx10 " Jonathan Kim
2022-10-31 16:23 ` [PATCH 09/29] drm/amdgpu: add gfx9.4.2 " Jonathan Kim
2022-10-31 16:23 ` [PATCH 10/29] drm/amdgpu: add configurable grace period for unmap queues Jonathan Kim
2022-11-23  0:21   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 11/29] drm/amdkfd: prepare map process for single process debug devices Jonathan Kim
2022-10-31 16:23 ` [PATCH 12/29] drm/amdgpu: prepare map process for multi-process " Jonathan Kim
2022-10-31 16:23 ` [PATCH 13/29] drm/amdkfd: add per process hw trap enable and disable functions Jonathan Kim
2022-10-31 16:23 ` [PATCH 14/29] drm/amdkfd: add raise exception event function Jonathan Kim
2022-10-31 16:23 ` [PATCH 15/29] drm/amdkfd: add send exception operation Jonathan Kim
2022-10-31 16:23 ` [PATCH 16/29] drm/amdkfd: add runtime enable operation Jonathan Kim
2022-11-23  0:52   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 17/29] drm/amdkfd: Add debug trap enabled flag to TMA Jonathan Kim
2022-11-23  0:44   ` Felix Kuehling
2022-11-24 14:51     ` Kim, Jonathan
2022-11-24 16:23       ` Felix Kuehling
2022-11-24 20:27         ` Kim, Jonathan
2022-11-25 16:53           ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 18/29] drm/amdkfd: update process interrupt handling for debug events Jonathan Kim
2022-10-31 16:23 ` [PATCH 19/29] drm/amdkfd: add debug set exceptions enabled operation Jonathan Kim
2022-11-24 21:24   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 20/29] drm/amdkfd: add debug wave launch override operation Jonathan Kim
2022-11-29 22:37   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 21/29] drm/amdkfd: add debug wave launch mode operation Jonathan Kim
2022-12-01  0:02   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 22/29] drm/amdkfd: add debug suspend and resume process queues operation Jonathan Kim
2022-11-29 23:55   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 23/29] drm/amdkfd: add debug set and clear address watch points operation Jonathan Kim
2022-11-30  0:34   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 24/29] drm/amdkfd: add debug set flags operation Jonathan Kim
2022-11-30  0:39   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 25/29] drm/amdkfd: add debug query event operation Jonathan Kim
2022-11-30  0:44   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 26/29] drm/amdkfd: add debug query exception info operation Jonathan Kim
2022-11-30  0:50   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 27/29] drm/amdkfd: add debug queue snapshot operation Jonathan Kim
2022-11-30 23:55   ` Felix Kuehling [this message]
2022-12-02 19:13     ` Kim, Jonathan
2022-10-31 16:23 ` [PATCH 28/29] drm/amdkfd: add debug device " Jonathan Kim
2022-12-01  0:00   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 29/29] drm/amdkfd: bump kfd ioctl minor version for debug api availability Jonathan Kim
2022-12-01  0:00   ` Felix Kuehling
2022-11-22 23:05 ` [PATCH 01/29] drm/amdkfd: add debug and runtime enable interface Felix Kuehling
2022-11-23 20:45   ` Kim, Jonathan
  -- strict thread matches above, loose matches on Subject: below --
2022-08-29 14:29 [PATCH 0/29] Introduce AMD GPU ISA Debugging for HSA Compute Jonathan Kim
2022-08-29 14:30 ` [PATCH 27/29] drm/amdkfd: add debug queue snapshot operation Jonathan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=65d1b014-9d27-e870-8f2b-fed9d5f02729@amd.com \
    --to=felix.kuehling@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=jonathan.kim@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox