public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] drm/amdgpu: Add new reset option and rework coredump
@ 2023-07-11 21:34 André Almeida
  2023-07-11 21:34 ` [PATCH 1/6] drm/amdgpu: Create a module param to disable soft recovery André Almeida
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: André Almeida @ 2023-07-11 21:34 UTC (permalink / raw)
  To: dri-devel, amd-gfx, linux-kernel
  Cc: kernel-dev, alexander.deucher, christian.koenig,
	pierre-eric.pelloux-prayer, 'Marek Olšák',
	Samuel Pitoiset, Bas Nieuwenhuizen, Timur Kristóf,
	michel.daenzer, André Almeida

Hi,

The goal of this patchset is to improve debugging device resets on amdgpu.

The first patch creates a new module parameter to disable soft recoveries,
ensuring every recovery go through the full device reset, making easier to
generate resets from userspace tools like [0] and [1]. This is important to
validate how the stack behaves on resets, from end-to-end.

The second patch is a small addition to mark guilty jobs that causes soft
recoveries for API consistency.

The last patches are a rework to store more information at devcoredump files,
making it more useful to be attached to bug reports.

The new coredump content look like this:

   **** AMDGPU Device Coredump ****
   version: 1
   kernel: 6.4.0-rc7-tony+
   module: amdgpu
   time: 702.743534320
   process_name: vulkan-triangle PID: 4561
   IBs:
   	[0] 0xffff800100545000
   	[1] 0xffff800100001000
   ring name: gfx_0.0.0

Due to nested IBs, this may not be the one that really caused the hang, but it
gives some direction.

Thanks,
	André

[0] https://gitlab.freedesktop.org/andrealmeid/gpu-timeout
[1] https://github.com/andrealmeid/vulkan-triangle-v1

André Almeida (6):
  drm/amdgpu: Create a module param to disable soft recovery
  drm/amdgpu: Mark contexts guilty for causing soft recoveries
  drm/amdgpu: Rework coredump to use memory dynamically
  drm/amdgpu: Limit info in coredump for kernel threads
  drm/amdgpu: Log IBs and ring name at coredump
  drm/amdgpu: Create version number for coredumps

 drivers/gpu/drm/amd/amdgpu/amdgpu.h        | 21 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c    |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 99 +++++++++++++++++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  9 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |  6 +-
 5 files changed, 112 insertions(+), 29 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-07-12 11:37 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-11 21:34 [PATCH 0/6] drm/amdgpu: Add new reset option and rework coredump André Almeida
2023-07-11 21:34 ` [PATCH 1/6] drm/amdgpu: Create a module param to disable soft recovery André Almeida
2023-07-11 21:34 ` [PATCH 2/6] drm/amdgpu: Mark contexts guilty for causing soft recoveries André Almeida
2023-07-12  8:43   ` Christian König
2023-07-11 21:34 ` [PATCH 3/6] drm/amdgpu: Rework coredump to use memory dynamically André Almeida
2023-07-12  8:37   ` Christian König
2023-07-12  8:59     ` Lucas Stach
2023-07-12 10:39       ` Christian König
2023-07-12 10:46         ` Lucas Stach
2023-07-12 10:56         ` Lucas Stach
2023-07-12 11:36           ` Christian König
2023-07-11 21:34 ` [PATCH 4/6] drm/amdgpu: Limit info in coredump for kernel threads André Almeida
2023-07-11 21:35 ` [PATCH 5/6] drm/amdgpu: Log IBs and ring name at coredump André Almeida
2023-07-11 21:35 ` [PATCH 6/6] drm/amdgpu: Create version number for coredumps André Almeida

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox