All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: Lijo Lazar <lijo.lazar@amd.com>, amd-gfx@lists.freedesktop.org
Cc: Alexander.Deucher@amd.com, Asad.Kamal@amd.com, Hawking.Zhang@amd.com
Subject: Re: [PATCH 0/5] Add work pool to reset domain
Date: Sat, 12 Aug 2023 10:23:11 +0200	[thread overview]
Message-ID: <f01ee061-ba70-ce2b-e52e-79ba273234c2@gmail.com> (raw)
In-Reply-To: <20230811060234.663789-1-lijo.lazar@amd.com>

Am 11.08.23 um 08:02 schrieb Lijo Lazar:
> Presently, there are multiple clients of reset like RAS, job timeout, KFD hang
> detection and debug method. Instead of each client maintaining a work item,
> reset work pool is moved to reset domain. When a client makes a recovery request,
> a work item is allocated by the reset domain and queued for execution. For the
> case of job timeout, each ring has its own TDR queue to which tdr work is
> scheduled. From there, it's further queued to a reset domain based on the device
> configuration.
>
> This allows flexibility to have multiple reset domains. For example, when
> there are partitions, each partition can maintain its own reset domain and a job
> timeout on one partition doesn't affect jobs on the other partition (when the
> jobs don't have any interdependency). The reset logic will select the
> appropriate reset domain based on the current device configuration.

Well completely NAK to that design.

We intentionally added the workqueue to serialize *all* reset work and I 
absolutely don't see any reason to change that.

Regards,
Christian.

>
> Lijo Lazar (5):
>    drm/amdgpu: Add work pool to reset domain
>    drm/amdgpu: Move to reset_schedule_work
>    drm/amdgpu: Set flags to cancel all pending resets
>    drm/amdgpu: Add API to queue and do reset work
>    drm/amdgpu: Add TDR queue for ring
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   2 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  32 +++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |   1 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  24 +---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  40 +++----
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  16 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  71 ++++++------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  | 122 ++++++++++++++++++++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  |  32 +++++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |   5 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   |   1 -
>   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c      |  38 +++----
>   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c      |  44 ++++----
>   drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c      |  33 +++---
>   15 files changed, 285 insertions(+), 177 deletions(-)
>


  parent reply	other threads:[~2023-08-12  8:23 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-11  6:02 [PATCH 0/5] Add work pool to reset domain Lijo Lazar
2023-08-11  6:02 ` [PATCH 1/5] drm/amdgpu: " Lijo Lazar
2023-08-11  6:02 ` [PATCH 2/5] drm/amdgpu: Move to reset_schedule_work Lijo Lazar
2023-08-11  6:02 ` [PATCH 3/5] drm/amdgpu: Set flags to cancel all pending resets Lijo Lazar
2023-08-11  6:02 ` [PATCH 4/5] drm/amdgpu: Add API to queue and do reset work Lijo Lazar
2023-08-11  6:02 ` [PATCH 5/5] drm/amdgpu: Add TDR queue for ring Lijo Lazar
2023-08-12  8:23 ` Christian König [this message]
2023-08-12 17:08   ` [PATCH 0/5] Add work pool to reset domain Lazar, Lijo
2023-08-14 11:55     ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f01ee061-ba70-ce2b-e52e-79ba273234c2@gmail.com \
    --to=ckoenig.leichtzumerken@gmail.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Asad.Kamal@amd.com \
    --cc=Hawking.Zhang@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=lijo.lazar@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.