From: Felix Kuehling <felix.kuehling@amd.com>
To: Jonathan Kim <jonathan.kim@amd.com>,
amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 25/32] drm/amdkfd: add debug suspend and resume process queues operation
Date: Tue, 21 Mar 2023 18:16:03 -0400 [thread overview]
Message-ID: <97b3ede7-dce8-59be-eadb-19e2101dfaa6@amd.com> (raw)
In-Reply-To: <20230125195401.4183544-26-jonathan.kim@amd.com>
On 2023-01-25 14:53, Jonathan Kim wrote:
> In order to inspect waves from the saved context at any point during a
> debug session, the debugger must be able to preempt queues to trigger
> context save by suspending them.
>
> On queue suspend, the KFD will copy the context save header information
> so that the debugger can correctly crawl the appropriate size of the saved
> context. The debugger must then also be allowed to resume suspended queues.
>
> A queue that is newly created cannot be suspended because queue ids are
> recycled after destruction so the debugger needs to know that this has
> occurred. Query functions will be later added that will clear a given
> queue of its new queue status.
>
> A queue cannot be destroyed while it is suspended to preserve its saved
> context during debugger inspection. Have queue destruction block while
> a queue is suspended and unblocked when it is resumed. Likewise, if a
> queue is about to be destroyed, it cannot be suspended.
>
> Return the number of queues successfully suspended or resumed along with
> a per queue status array where the upper bits per queue status show that
> the request was invalid (new/destroyed queue suspend request, missing
> queue) or an error occurred (HWS in a fatal state so it can't suspend or
> resume queues).
>
> v2: add gfx11/mes support.
> prevent header copy on suspend from overwriting user fields.
> simplify resume_queues function.
> address other nit-picks
>
> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 5 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 +
> drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 11 +
> drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 7 +
> .../drm/amd/amdkfd/kfd_device_queue_manager.c | 446 +++++++++++++++++-
> .../drm/amd/amdkfd/kfd_device_queue_manager.h | 10 +
> .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c | 14 +
> .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 11 +-
> .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 18 +-
> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 5 +-
> 10 files changed, 518 insertions(+), 10 deletions(-)
>
[snip]
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
> index 50da16dd4c96..047c43418a1a 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
> @@ -288,6 +288,11 @@ static int get_wave_state(struct mqd_manager *mm, void *mqd,
> u32 *save_area_used_size)
> {
> struct v9_mqd *m;
> + struct kfd_context_save_area_header header;
> + size_t header_copy_size = sizeof(header.control_stack_size) +
> + sizeof(header.wave_state_size) +
> + sizeof(header.wave_state_offset) +
> + sizeof(header.control_stack_offset);
This makes assumptions about the structure layout. I'd feel better if
these fields were in a sub-structure, which would make this easier and
safer to handle.
struct kfd_context_save_area_header {
struct {
__u32 control_stack_offset;
__u32 control_stack_size;
__u32 wave_state_offset;
__u32 wave_state_size;
} wave_state;
...
};
...
|static int get_wave_state(...) { struct kfd_context_save_area_header
header; ... header.wave_state.control_stack_size = *ctl_stack_used_size;
header.wave_state.wave_state_size = *save_area_used_size;
header.wave_state.wave_state_offset = m->cp_hqd_wg_state_offset;
header.wave_state.control_stack_offset = m->cp_hqd_cntl_stack_offset; if
(copy_to_user(ctl_stack, &header.wave_state, sizeof(header.wave_state)))
return -EFAULT; ... } |
This way you're sure you only copy initialized data. The only assumption
this still makes is, that wave_state is at the start of the header
structure.
Regards,
Felix
>
> /* Control stack is located one page after MQD. */
> void *mqd_ctl_stack = (void *)((uintptr_t)mqd + PAGE_SIZE);
> @@ -299,7 +304,18 @@ static int get_wave_state(struct mqd_manager *mm, void *mqd,
> *save_area_used_size = m->cp_hqd_wg_state_offset -
> m->cp_hqd_cntl_stack_size;
>
> - if (copy_to_user(ctl_stack, mqd_ctl_stack, m->cp_hqd_cntl_stack_size))
> + header.control_stack_size = *ctl_stack_used_size;
> + header.wave_state_size = *save_area_used_size;
> +
> + header.wave_state_offset = m->cp_hqd_wg_state_offset;
> + header.control_stack_offset = m->cp_hqd_cntl_stack_offset;
> +
> + if (copy_to_user(ctl_stack, &header, header_copy_size))
> + return -EFAULT;
> +
> + if (copy_to_user(ctl_stack + m->cp_hqd_cntl_stack_offset,
> + mqd_ctl_stack + m->cp_hqd_cntl_stack_offset,
> + *ctl_stack_used_size))
> return -EFAULT;
>
> return 0;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 6f7dc23af104..8dc7cc1e18a5 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -477,6 +477,8 @@ struct queue_properties {
> uint32_t doorbell_off;
> bool is_interop;
> bool is_evicted;
> + bool is_suspended;
> + bool is_being_destroyed;
> bool is_active;
> bool is_gws;
> bool is_dbg_wa;
> @@ -501,7 +503,8 @@ struct queue_properties {
> #define QUEUE_IS_ACTIVE(q) ((q).queue_size > 0 && \
> (q).queue_address != 0 && \
> (q).queue_percent > 0 && \
> - !(q).is_evicted)
> + !(q).is_evicted && \
> + !(q).is_suspended)
>
> enum mqd_update_flag {
> UPDATE_FLAG_DBG_WA_ENABLE = 1,
next prev parent reply other threads:[~2023-03-21 22:16 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-25 19:53 [PATCH 00/32] Upstream of kernel support for AMDGPU ISA debugging Jonathan Kim
2023-01-25 19:53 ` [PATCH 01/32] drm/amdkfd: add debug and runtime enable interface Jonathan Kim
2023-02-16 22:16 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 02/32] drm/amdkfd: display debug capabilities Jonathan Kim
2023-02-16 22:24 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 03/32] drm/amdkfd: prepare per-process debug enable and disable Jonathan Kim
2023-02-16 23:44 ` Felix Kuehling
2023-03-23 19:12 ` Kim, Jonathan
2023-03-23 20:08 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 04/32] drm/amdgpu: add kgd hw debug mode setting interface Jonathan Kim
2023-01-25 19:53 ` [PATCH 05/32] drm/amdgpu: setup hw debug registers on driver initialization Jonathan Kim
2023-02-16 22:39 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 06/32] drm/amdgpu: add gfx9 hw debug mode enable and disable calls Jonathan Kim
2023-01-29 5:12 ` kernel test robot
2023-02-16 22:54 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 07/32] drm/amdgpu: add gfx9.4.1 " Jonathan Kim
2023-01-29 6:34 ` kernel test robot
2023-02-16 23:01 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 08/32] drm/amdgpu: add gfx10 " Jonathan Kim
2023-01-29 7:55 ` kernel test robot
2023-02-16 23:11 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 09/32] drm/amdgpu: add gfx9.4.2 " Jonathan Kim
2023-02-16 23:14 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 10/32] drm/amdgpu: add gfx11 " Jonathan Kim
2023-02-16 23:19 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 11/32] drm/amdgpu: add configurable grace period for unmap queues Jonathan Kim
2023-03-20 19:19 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 12/32] drm/amdkfd: prepare map process for single process debug devices Jonathan Kim
2023-03-20 20:06 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 13/32] drm/amdgpu: prepare map process for multi-process " Jonathan Kim
2023-03-20 20:16 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 14/32] drm/amdgpu: expose debug api for mes Jonathan Kim
2023-03-20 20:47 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 15/32] drm/amdkfd: prepare trap workaround for gfx11 Jonathan Kim
2023-03-20 21:49 ` Felix Kuehling
2023-03-23 13:50 ` Kim, Jonathan
2023-03-23 14:00 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 16/32] drm/amdkfd: add per process hw trap enable and disable functions Jonathan Kim
2023-03-20 23:06 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 17/32] drm/amdkfd: add raise exception event function Jonathan Kim
2023-03-20 23:18 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 18/32] drm/amdkfd: add send exception operation Jonathan Kim
2023-03-20 23:26 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 19/32] drm/amdkfd: add runtime enable operation Jonathan Kim
2023-03-21 0:31 ` Felix Kuehling
2023-03-23 19:45 ` Kim, Jonathan
2023-01-25 19:53 ` [PATCH 20/32] drm/amdkfd: add debug trap enabled flag to tma Jonathan Kim
2023-01-25 19:53 ` [PATCH 21/32] drm/amdkfd: update process interrupt handling for debug events Jonathan Kim
2023-03-21 21:07 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 22/32] drm/amdkfd: add debug set exceptions enabled operation Jonathan Kim
2023-01-25 19:53 ` [PATCH 23/32] drm/amdkfd: add debug wave launch override operation Jonathan Kim
2023-03-21 21:37 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 24/32] drm/amdkfd: add debug wave launch mode operation Jonathan Kim
2023-03-21 21:42 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 25/32] drm/amdkfd: add debug suspend and resume process queues operation Jonathan Kim
2023-03-21 22:16 ` Felix Kuehling [this message]
2023-01-25 19:53 ` [PATCH 26/32] drm/amdkfd: add debug set and clear address watch points operation Jonathan Kim
2023-03-22 21:38 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 27/32] drm/amdkfd: add debug set flags operation Jonathan Kim
2023-03-22 21:47 ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 28/32] drm/amdkfd: add debug query event operation Jonathan Kim
2023-01-25 19:53 ` [PATCH 29/32] drm/amdkfd: add debug query exception info operation Jonathan Kim
2023-01-25 19:53 ` [PATCH 30/32] drm/amdkfd: add debug queue snapshot operation Jonathan Kim
2023-03-22 21:52 ` Felix Kuehling
2023-01-25 19:54 ` [PATCH 31/32] drm/amdkfd: add debug device " Jonathan Kim
2023-03-22 21:54 ` Felix Kuehling
2023-01-25 19:54 ` [PATCH 32/32] drm/amdkfd: bump kfd ioctl minor version for debug api availability Jonathan Kim
2023-03-22 21:56 ` Felix Kuehling
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=97b3ede7-dce8-59be-eadb-19e2101dfaa6@amd.com \
--to=felix.kuehling@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=jonathan.kim@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox