All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] NVKM GSP RPC message handling policy
@ 2025-02-07 17:58 Zhi Wang
  2025-02-07 17:58 ` [PATCH 1/5] drm/nouveau/nvkm: factor out r535_gsp_rpc_handle_reply() Zhi Wang
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Zhi Wang @ 2025-02-07 17:58 UTC (permalink / raw)
  To: nouveau
  Cc: airlied, daniel, dakr, bskeggs, acurrid, cjia, smitra, ankita,
	aniketa, kwankhede, targupta, zhiw, zhiwang

Ben reported an issue that the patch [1] breaks the suspend/resume.

After digging for a while, I noticed that this problem had been there
before introducing that patch, but not exposed because r535_gsp_rpc_push()
doesn't repsect the caller's requirement when handling the large RPC
command: It won't wait for the reply even the caller requires. (Small
RPCs are fine.)

After that patch series is introduced, r535_gsp_rpc_push() really waits
for the reply and receives the entire GSP message, which is required
by the large vGPU RPC command.

There are currently two GSP RPC message handling policy:

- a. dont care. discard the message before returning to the caller.
- b. receive the entire message. wait and receive the entire message before
  returning to the caller.

On the path of suspend/resume, there is a large GSP command
NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY, which returns only a GSP RPC message
header to tell the driver that the request is handled. The policy in the
driver is to receive the entrie message, which ends up with a timeout
and error when r535_gsp_rpc_push() tries to receive the message. That
breaks the suspend/resume path.

This series factors out the current GSP RPC message handling policy and
introduces a new policy for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY and a
kernel doc to illustrate the policies.

With this patchset, the problem can't be reproduced and suspend/resume
works on my L40.

[1] https://lore.kernel.org/nouveau/7eb31f1f-fc3a-4fb5-86cf-4bd011d68ff1@nvidia.com/T/#t

Zhi Wang (5):
  drm/nouveau/nvkm: factor out r535_gsp_rpc_handle_reply()
  drm/nouveau/nvkm: factor out the current RPC command reply policies
  drm/nouveau/nvkm: introduce new GSP reply policy
    NVKM_GSP_RPC_REPLY_POLL
  drm/nouveau/nvkm: use the new policy for
    NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY
  drm/nouveau/nvkm: introduce a kernel doc for GSP message handling

 Documentation/gpu/nouveau.rst                 |  3 +
 .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h | 34 ++++++--
 .../gpu/drm/nouveau/nvkm/subdev/bar/r535.c    |  2 +-
 .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 80 +++++++++++--------
 .../drm/nouveau/nvkm/subdev/instmem/r535.c    |  2 +-
 5 files changed, 78 insertions(+), 43 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-02-20 20:48 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-07 17:58 [PATCH 0/5] NVKM GSP RPC message handling policy Zhi Wang
2025-02-07 17:58 ` [PATCH 1/5] drm/nouveau/nvkm: factor out r535_gsp_rpc_handle_reply() Zhi Wang
2025-02-12  6:32   ` Alexandre Courbot
2025-02-07 17:58 ` [PATCH 2/5] drm/nouveau/nvkm: factor out the current RPC command reply policies Zhi Wang
2025-02-12  6:55   ` Alexandre Courbot
2025-02-12 19:56     ` Zhi Wang
2025-02-07 17:58 ` [PATCH 3/5] drm/nouveau/nvkm: introduce new GSP reply policy NVKM_GSP_RPC_REPLY_POLL Zhi Wang
2025-02-12  6:57   ` Alexandre Courbot
2025-02-12 13:33     ` Danilo Krummrich
2025-02-12 19:59       ` Zhi Wang
2025-02-07 17:58 ` [PATCH 4/5] drm/nouveau/nvkm: use the new policy for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY Zhi Wang
2025-02-07 17:58 ` [PATCH 5/5] drm/nouveau/nvkm: introduce a kernel doc for GSP message handling Zhi Wang
2025-02-12  6:59   ` Alexandre Courbot
2025-02-12 13:36 ` [PATCH 0/5] NVKM GSP RPC message handling policy Danilo Krummrich
2025-02-20 20:48 ` Ben Skeggs

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.