All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Skeggs <bskeggs@nvidia.com>
To: Zhi Wang <zhiw@nvidia.com>, <nouveau@lists.freedesktop.org>
Cc: <airlied@gmail.com>, <daniel@ffwll.ch>, <dakr@kernel.org>,
	<acurrid@nvidia.com>, <cjia@nvidia.com>, <smitra@nvidia.com>,
	<ankita@nvidia.com>, <aniketa@nvidia.com>, <kwankhede@nvidia.com>,
	<targupta@nvidia.com>, <zhiwang@kernel.org>
Subject: Re: [PATCH 0/5] NVKM GSP RPC message handling policy
Date: Fri, 21 Feb 2025 06:48:04 +1000	[thread overview]
Message-ID: <9cde5459-9225-4e15-b239-e4728ded1769@nvidia.com> (raw)
In-Reply-To: <20250207175806.78051-1-zhiw@nvidia.com>

On 8/2/25 03:58, Zhi Wang wrote:

> Ben reported an issue that the patch [1] breaks the suspend/resume.
>
> After digging for a while, I noticed that this problem had been there
> before introducing that patch, but not exposed because r535_gsp_rpc_push()
> doesn't repsect the caller's requirement when handling the large RPC
> command: It won't wait for the reply even the caller requires. (Small
> RPCs are fine.)
>
> After that patch series is introduced, r535_gsp_rpc_push() really waits
> for the reply and receives the entire GSP message, which is required
> by the large vGPU RPC command.
>
> There are currently two GSP RPC message handling policy:
>
> - a. dont care. discard the message before returning to the caller.
> - b. receive the entire message. wait and receive the entire message before
>    returning to the caller.
>
> On the path of suspend/resume, there is a large GSP command
> NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY, which returns only a GSP RPC message
> header to tell the driver that the request is handled. The policy in the
> driver is to receive the entrie message, which ends up with a timeout
> and error when r535_gsp_rpc_push() tries to receive the message. That
> breaks the suspend/resume path.
>
> This series factors out the current GSP RPC message handling policy and
> introduces a new policy for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY and a
> kernel doc to illustrate the policies.
>
> With this patchset, the problem can't be reproduced and suspend/resume
> works on my L40.

This seems to fix the issue here on top of current drm-misc-next.

Tested-by: Ben Skeggs <bskeggs@nvidia.com>

>
> [1] https://lore.kernel.org/nouveau/7eb31f1f-fc3a-4fb5-86cf-4bd011d68ff1@nvidia.com/T/#t
>
> Zhi Wang (5):
>    drm/nouveau/nvkm: factor out r535_gsp_rpc_handle_reply()
>    drm/nouveau/nvkm: factor out the current RPC command reply policies
>    drm/nouveau/nvkm: introduce new GSP reply policy
>      NVKM_GSP_RPC_REPLY_POLL
>    drm/nouveau/nvkm: use the new policy for
>      NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY
>    drm/nouveau/nvkm: introduce a kernel doc for GSP message handling
>
>   Documentation/gpu/nouveau.rst                 |  3 +
>   .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h | 34 ++++++--
>   .../gpu/drm/nouveau/nvkm/subdev/bar/r535.c    |  2 +-
>   .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 80 +++++++++++--------
>   .../drm/nouveau/nvkm/subdev/instmem/r535.c    |  2 +-
>   5 files changed, 78 insertions(+), 43 deletions(-)
>

      parent reply	other threads:[~2025-02-20 20:48 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-07 17:58 [PATCH 0/5] NVKM GSP RPC message handling policy Zhi Wang
2025-02-07 17:58 ` [PATCH 1/5] drm/nouveau/nvkm: factor out r535_gsp_rpc_handle_reply() Zhi Wang
2025-02-12  6:32   ` Alexandre Courbot
2025-02-07 17:58 ` [PATCH 2/5] drm/nouveau/nvkm: factor out the current RPC command reply policies Zhi Wang
2025-02-12  6:55   ` Alexandre Courbot
2025-02-12 19:56     ` Zhi Wang
2025-02-07 17:58 ` [PATCH 3/5] drm/nouveau/nvkm: introduce new GSP reply policy NVKM_GSP_RPC_REPLY_POLL Zhi Wang
2025-02-12  6:57   ` Alexandre Courbot
2025-02-12 13:33     ` Danilo Krummrich
2025-02-12 19:59       ` Zhi Wang
2025-02-07 17:58 ` [PATCH 4/5] drm/nouveau/nvkm: use the new policy for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY Zhi Wang
2025-02-07 17:58 ` [PATCH 5/5] drm/nouveau/nvkm: introduce a kernel doc for GSP message handling Zhi Wang
2025-02-12  6:59   ` Alexandre Courbot
2025-02-12 13:36 ` [PATCH 0/5] NVKM GSP RPC message handling policy Danilo Krummrich
2025-02-20 20:48 ` Ben Skeggs [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9cde5459-9225-4e15-b239-e4728ded1769@nvidia.com \
    --to=bskeggs@nvidia.com \
    --cc=acurrid@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=aniketa@nvidia.com \
    --cc=ankita@nvidia.com \
    --cc=cjia@nvidia.com \
    --cc=dakr@kernel.org \
    --cc=daniel@ffwll.ch \
    --cc=kwankhede@nvidia.com \
    --cc=nouveau@lists.freedesktop.org \
    --cc=smitra@nvidia.com \
    --cc=targupta@nvidia.com \
    --cc=zhiw@nvidia.com \
    --cc=zhiwang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.