All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] NVKM GSP RPC message handling policy
@ 2025-02-07 17:58 Zhi Wang
  2025-02-07 17:58 ` [PATCH 1/5] drm/nouveau/nvkm: factor out r535_gsp_rpc_handle_reply() Zhi Wang
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Zhi Wang @ 2025-02-07 17:58 UTC (permalink / raw)
  To: nouveau
  Cc: airlied, daniel, dakr, bskeggs, acurrid, cjia, smitra, ankita,
	aniketa, kwankhede, targupta, zhiw, zhiwang

Ben reported an issue that the patch [1] breaks the suspend/resume.

After digging for a while, I noticed that this problem had been there
before introducing that patch, but not exposed because r535_gsp_rpc_push()
doesn't repsect the caller's requirement when handling the large RPC
command: It won't wait for the reply even the caller requires. (Small
RPCs are fine.)

After that patch series is introduced, r535_gsp_rpc_push() really waits
for the reply and receives the entire GSP message, which is required
by the large vGPU RPC command.

There are currently two GSP RPC message handling policy:

- a. dont care. discard the message before returning to the caller.
- b. receive the entire message. wait and receive the entire message before
  returning to the caller.

On the path of suspend/resume, there is a large GSP command
NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY, which returns only a GSP RPC message
header to tell the driver that the request is handled. The policy in the
driver is to receive the entrie message, which ends up with a timeout
and error when r535_gsp_rpc_push() tries to receive the message. That
breaks the suspend/resume path.

This series factors out the current GSP RPC message handling policy and
introduces a new policy for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY and a
kernel doc to illustrate the policies.

With this patchset, the problem can't be reproduced and suspend/resume
works on my L40.

[1] https://lore.kernel.org/nouveau/7eb31f1f-fc3a-4fb5-86cf-4bd011d68ff1@nvidia.com/T/#t

Zhi Wang (5):
  drm/nouveau/nvkm: factor out r535_gsp_rpc_handle_reply()
  drm/nouveau/nvkm: factor out the current RPC command reply policies
  drm/nouveau/nvkm: introduce new GSP reply policy
    NVKM_GSP_RPC_REPLY_POLL
  drm/nouveau/nvkm: use the new policy for
    NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY
  drm/nouveau/nvkm: introduce a kernel doc for GSP message handling

 Documentation/gpu/nouveau.rst                 |  3 +
 .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h | 34 ++++++--
 .../gpu/drm/nouveau/nvkm/subdev/bar/r535.c    |  2 +-
 .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 80 +++++++++++--------
 .../drm/nouveau/nvkm/subdev/instmem/r535.c    |  2 +-
 5 files changed, 78 insertions(+), 43 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/5] drm/nouveau/nvkm: factor out r535_gsp_rpc_handle_reply()
  2025-02-07 17:58 [PATCH 0/5] NVKM GSP RPC message handling policy Zhi Wang
@ 2025-02-07 17:58 ` Zhi Wang
  2025-02-12  6:32   ` Alexandre Courbot
  2025-02-07 17:58 ` [PATCH 2/5] drm/nouveau/nvkm: factor out the current RPC command reply policies Zhi Wang
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Zhi Wang @ 2025-02-07 17:58 UTC (permalink / raw)
  To: nouveau
  Cc: airlied, daniel, dakr, bskeggs, acurrid, cjia, smitra, ankita,
	aniketa, kwankhede, targupta, zhiw, zhiwang

There can be multiple cases of handling the GSP RPC messages, which are
the reply of GSP RPC commands according to the requirement of the
callers and the nature of the GSP RPC commands.

To support all cases, first, centralize the handling of the reply in a
single function. Factor out r535_gsp_rpc_handle_reply().

No functional change is intended for small GSP RPC commands. For large GSP
commands, the caller decides the policy of how to handle the returned GSP
RPC message.

Signed-off-by: Zhi Wang <zhiw@nvidia.com>
---
 .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 48 +++++++++----------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index 2075cad63805..1ed7d5624a56 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -583,14 +583,30 @@ r535_gsp_rpc_poll(struct nvkm_gsp *gsp, u32 fn)
 	return 0;
 }
 
+static void *
+r535_gsp_rpc_handle_reply(struct nvkm_gsp *gsp, u32 fn, bool wait,
+			  u32 gsp_rpc_len)
+{
+	struct nvfw_gsp_rpc *msg;
+	void *repv = NULL;
+
+	if (wait) {
+		msg = r535_gsp_msg_recv(gsp, fn, gsp_rpc_len);
+		if (!IS_ERR_OR_NULL(msg))
+			repv = msg->data;
+		else
+			repv = msg;
+	}
+
+	return repv;
+}
+
 static void *
 r535_gsp_rpc_send(struct nvkm_gsp *gsp, void *payload, bool wait,
 		  u32 gsp_rpc_len)
 {
 	struct nvfw_gsp_rpc *rpc = to_gsp_hdr(payload, rpc);
-	struct nvfw_gsp_rpc *msg;
-	u32 fn = rpc->function;
-	void *repv = NULL;
+	u32 function = rpc->function;
 	int ret;
 
 	if (gsp->subdev.debug >= NV_DBG_TRACE) {
@@ -604,15 +620,7 @@ r535_gsp_rpc_send(struct nvkm_gsp *gsp, void *payload, bool wait,
 	if (ret)
 		return ERR_PTR(ret);
 
-	if (wait) {
-		msg = r535_gsp_msg_recv(gsp, fn, gsp_rpc_len);
-		if (!IS_ERR_OR_NULL(msg))
-			repv = msg->data;
-		else
-			repv = msg;
-	}
-
-	return repv;
+	return r535_gsp_rpc_handle_reply(gsp, function, wait, gsp_rpc_len);
 }
 
 static void
@@ -996,18 +1004,10 @@ r535_gsp_rpc_push(struct nvkm_gsp *gsp, void *payload, bool wait,
 		}
 
 		/* Wait for reply. */
-		rpc = r535_gsp_msg_recv(gsp, fn, payload_size +
-					sizeof(*rpc));
-		if (!IS_ERR_OR_NULL(rpc)) {
-			if (wait) {
-				repv = rpc->data;
-			} else {
-				nvkm_gsp_rpc_done(gsp, rpc);
-				repv = NULL;
-			}
-		} else {
-			repv = wait ? rpc : NULL;
-		}
+		repv = r535_gsp_rpc_handle_reply(gsp, fn, wait, payload_size +
+						 sizeof(*rpc));
+		if (IS_ERR_OR_NULL(repv))
+			repv = wait ? repv : NULL;
 	} else {
 		repv = r535_gsp_rpc_send(gsp, payload, wait, gsp_rpc_len);
 	}
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/5] drm/nouveau/nvkm: factor out the current RPC command reply policies
  2025-02-07 17:58 [PATCH 0/5] NVKM GSP RPC message handling policy Zhi Wang
  2025-02-07 17:58 ` [PATCH 1/5] drm/nouveau/nvkm: factor out r535_gsp_rpc_handle_reply() Zhi Wang
@ 2025-02-07 17:58 ` Zhi Wang
  2025-02-12  6:55   ` Alexandre Courbot
  2025-02-07 17:58 ` [PATCH 3/5] drm/nouveau/nvkm: introduce new GSP reply policy NVKM_GSP_RPC_REPLY_POLL Zhi Wang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Zhi Wang @ 2025-02-07 17:58 UTC (permalink / raw)
  To: nouveau
  Cc: airlied, daniel, dakr, bskeggs, acurrid, cjia, smitra, ankita,
	aniketa, kwankhede, targupta, zhiw, zhiwang

There can be multiple cases of handling the GSP RPC messages, which are
the reply of GSP RPC commands according to the requirement of the callers
and the nature of the GSP RPC commands.

The current supported reply policies are "callers don't care" or "receive
the entire message" according to the requirement of the caller. For
introducing a new policy, factor out the current RPC command reply
polices.

Factor out NVKM_GSP_RPC_REPLY_RECV as "receive the entire message". If
none is specified, the default is "callers don't care".

No functional change is intended.

Signed-off-by: Zhi Wang <zhiw@nvidia.com>
---
 .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h | 16 +++++---
 .../gpu/drm/nouveau/nvkm/subdev/bar/r535.c    |  2 +-
 .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 41 +++++++++++--------
 .../drm/nouveau/nvkm/subdev/instmem/r535.c    |  2 +-
 4 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
index 746e126c3ecf..c467e44cab47 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
@@ -31,6 +31,10 @@ typedef int (*nvkm_gsp_msg_ntfy_func)(void *priv, u32 fn, void *repv, u32 repc);
 struct nvkm_gsp_event;
 typedef void (*nvkm_gsp_event_func)(struct nvkm_gsp_event *, void *repv, u32 repc);
 
+enum {
+	NVKM_GSP_RPC_REPLY_RECV = 1,
+};
+
 struct nvkm_gsp {
 	const struct nvkm_gsp_func *func;
 	struct nvkm_subdev subdev;
@@ -188,7 +192,7 @@ struct nvkm_gsp {
 
 	const struct nvkm_gsp_rm {
 		void *(*rpc_get)(struct nvkm_gsp *, u32 fn, u32 argc);
-		void *(*rpc_push)(struct nvkm_gsp *, void *argv, bool wait, u32 repc);
+		void *(*rpc_push)(struct nvkm_gsp *gsp, void *argv, int reply, u32 repc);
 		void (*rpc_done)(struct nvkm_gsp *gsp, void *repv);
 
 		void *(*rm_ctrl_get)(struct nvkm_gsp_object *, u32 cmd, u32 argc);
@@ -255,9 +259,9 @@ nvkm_gsp_rpc_get(struct nvkm_gsp *gsp, u32 fn, u32 argc)
 }
 
 static inline void *
-nvkm_gsp_rpc_push(struct nvkm_gsp *gsp, void *argv, bool wait, u32 repc)
+nvkm_gsp_rpc_push(struct nvkm_gsp *gsp, void *argv, int reply, u32 repc)
 {
-	return gsp->rm->rpc_push(gsp, argv, wait, repc);
+	return gsp->rm->rpc_push(gsp, argv, reply, repc);
 }
 
 static inline void *
@@ -268,13 +272,13 @@ nvkm_gsp_rpc_rd(struct nvkm_gsp *gsp, u32 fn, u32 argc)
 	if (IS_ERR_OR_NULL(argv))
 		return argv;
 
-	return nvkm_gsp_rpc_push(gsp, argv, true, argc);
+	return nvkm_gsp_rpc_push(gsp, argv, NVKM_GSP_RPC_REPLY_RECV, argc);
 }
 
 static inline int
-nvkm_gsp_rpc_wr(struct nvkm_gsp *gsp, void *argv, bool wait)
+nvkm_gsp_rpc_wr(struct nvkm_gsp *gsp, void *argv, int reply)
 {
-	void *repv = nvkm_gsp_rpc_push(gsp, argv, wait, 0);
+	void *repv = nvkm_gsp_rpc_push(gsp, argv, reply, 0);
 
 	if (IS_ERR(repv))
 		return PTR_ERR(repv);
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/r535.c
index 3a30bea30e36..90186f98065c 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/r535.c
@@ -56,7 +56,7 @@ r535_bar_bar2_update_pde(struct nvkm_gsp *gsp, u64 addr)
 	rpc->info.entryValue = addr ? ((addr >> 4) | 2) : 0; /* PD3 entry format! */
 	rpc->info.entryLevelShift = 47; //XXX: probably fetch this from mmu!
 
-	return nvkm_gsp_rpc_wr(gsp, rpc, true);
+	return nvkm_gsp_rpc_wr(gsp, rpc, NVKM_GSP_RPC_REPLY_RECV);
 }
 
 static void
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index 1ed7d5624a56..bc8eb9a3cb28 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -584,25 +584,32 @@ r535_gsp_rpc_poll(struct nvkm_gsp *gsp, u32 fn)
 }
 
 static void *
-r535_gsp_rpc_handle_reply(struct nvkm_gsp *gsp, u32 fn, bool wait,
+r535_gsp_rpc_handle_reply(struct nvkm_gsp *gsp, u32 fn, int reply,
 			  u32 gsp_rpc_len)
 {
 	struct nvfw_gsp_rpc *msg;
 	void *repv = NULL;
 
-	if (wait) {
+	if (!reply)
+		return NULL;
+
+	switch (reply) {
+	case NVKM_GSP_RPC_REPLY_RECV:
 		msg = r535_gsp_msg_recv(gsp, fn, gsp_rpc_len);
 		if (!IS_ERR_OR_NULL(msg))
 			repv = msg->data;
 		else
 			repv = msg;
+		break;
+	default:
+		repv = ERR_PTR(-EINVAL);
+		break;
 	}
-
 	return repv;
 }
 
 static void *
-r535_gsp_rpc_send(struct nvkm_gsp *gsp, void *payload, bool wait,
+r535_gsp_rpc_send(struct nvkm_gsp *gsp, void *payload, int reply,
 		  u32 gsp_rpc_len)
 {
 	struct nvfw_gsp_rpc *rpc = to_gsp_hdr(payload, rpc);
@@ -620,7 +627,7 @@ r535_gsp_rpc_send(struct nvkm_gsp *gsp, void *payload, bool wait,
 	if (ret)
 		return ERR_PTR(ret);
 
-	return r535_gsp_rpc_handle_reply(gsp, function, wait, gsp_rpc_len);
+	return r535_gsp_rpc_handle_reply(gsp, function, reply, gsp_rpc_len);
 }
 
 static void
@@ -804,7 +811,7 @@ r535_gsp_rpc_rm_free(struct nvkm_gsp_object *object)
 	rpc->params.hRoot = client->object.handle;
 	rpc->params.hObjectParent = 0;
 	rpc->params.hObjectOld = object->handle;
-	return nvkm_gsp_rpc_wr(gsp, rpc, true);
+	return nvkm_gsp_rpc_wr(gsp, rpc, NVKM_GSP_RPC_REPLY_RECV);
 }
 
 static void
@@ -822,7 +829,7 @@ r535_gsp_rpc_rm_alloc_push(struct nvkm_gsp_object *object, void *params)
 	struct nvkm_gsp *gsp = object->client->gsp;
 	void *ret = NULL;
 
-	rpc = nvkm_gsp_rpc_push(gsp, rpc, true, sizeof(*rpc));
+	rpc = nvkm_gsp_rpc_push(gsp, rpc, NVKM_GSP_RPC_REPLY_RECV, sizeof(*rpc));
 	if (IS_ERR_OR_NULL(rpc))
 		return rpc;
 
@@ -883,7 +890,7 @@ r535_gsp_rpc_rm_ctrl_push(struct nvkm_gsp_object *object, void **params, u32 rep
 	struct nvkm_gsp *gsp = object->client->gsp;
 	int ret = 0;
 
-	rpc = nvkm_gsp_rpc_push(gsp, rpc, true, repc);
+	rpc = nvkm_gsp_rpc_push(gsp, rpc, NVKM_GSP_RPC_REPLY_RECV, repc);
 	if (IS_ERR_OR_NULL(rpc)) {
 		*params = NULL;
 		return PTR_ERR(rpc);
@@ -955,7 +962,7 @@ r535_gsp_rpc_get(struct nvkm_gsp *gsp, u32 fn, u32 payload_size)
 }
 
 static void *
-r535_gsp_rpc_push(struct nvkm_gsp *gsp, void *payload, bool wait,
+r535_gsp_rpc_push(struct nvkm_gsp *gsp, void *payload, int reply,
 		  u32 gsp_rpc_len)
 {
 	struct nvfw_gsp_rpc *rpc = to_gsp_hdr(payload, rpc);
@@ -974,7 +981,7 @@ r535_gsp_rpc_push(struct nvkm_gsp *gsp, void *payload, bool wait,
 		rpc->length = sizeof(*rpc) + max_payload_size;
 		msg->checksum = rpc->length;
 
-		repv = r535_gsp_rpc_send(gsp, payload, false, 0);
+		repv = r535_gsp_rpc_send(gsp, payload, 0, 0);
 		if (IS_ERR(repv))
 			goto done;
 
@@ -995,7 +1002,7 @@ r535_gsp_rpc_push(struct nvkm_gsp *gsp, void *payload, bool wait,
 
 			memcpy(next, payload, size);
 
-			repv = r535_gsp_rpc_send(gsp, next, false, 0);
+			repv = r535_gsp_rpc_send(gsp, next, 0, 0);
 			if (IS_ERR(repv))
 				goto done;
 
@@ -1004,12 +1011,12 @@ r535_gsp_rpc_push(struct nvkm_gsp *gsp, void *payload, bool wait,
 		}
 
 		/* Wait for reply. */
-		repv = r535_gsp_rpc_handle_reply(gsp, fn, wait, payload_size +
+		repv = r535_gsp_rpc_handle_reply(gsp, fn, reply, payload_size +
 						 sizeof(*rpc));
 		if (IS_ERR_OR_NULL(repv))
-			repv = wait ? repv : NULL;
+			repv = reply ? repv : NULL;
 	} else {
-		repv = r535_gsp_rpc_send(gsp, payload, wait, gsp_rpc_len);
+		repv = r535_gsp_rpc_send(gsp, payload, reply, gsp_rpc_len);
 	}
 
 done:
@@ -1326,7 +1333,7 @@ r535_gsp_rpc_unloading_guest_driver(struct nvkm_gsp *gsp, bool suspend)
 		rpc->newLevel = NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_0;
 	}
 
-	return nvkm_gsp_rpc_wr(gsp, rpc, true);
+	return nvkm_gsp_rpc_wr(gsp, rpc, NVKM_GSP_RPC_REPLY_RECV);
 }
 
 enum registry_type {
@@ -1683,7 +1690,7 @@ r535_gsp_rpc_set_registry(struct nvkm_gsp *gsp)
 
 	build_registry(gsp, rpc);
 
-	return nvkm_gsp_rpc_wr(gsp, rpc, false);
+	return nvkm_gsp_rpc_wr(gsp, rpc, 0);
 
 fail:
 	clean_registry(gsp);
@@ -1892,7 +1899,7 @@ r535_gsp_rpc_set_system_info(struct nvkm_gsp *gsp)
 	info->pciConfigMirrorSize = 0x001000;
 	r535_gsp_acpi_info(gsp, &info->acpiMethodData);
 
-	return nvkm_gsp_rpc_wr(gsp, info, false);
+	return nvkm_gsp_rpc_wr(gsp, info, 0);
 }
 
 static int
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/r535.c
index 5f3c9c02a4c0..2789efe9c100 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/r535.c
@@ -105,7 +105,7 @@ fbsr_memlist(struct nvkm_gsp_device *device, u32 handle, enum nvkm_memory_target
 			rpc->pteDesc.pte_pde[i].pte = (phys >> 12) + i;
 	}
 
-	ret = nvkm_gsp_rpc_wr(gsp, rpc, true);
+	ret = nvkm_gsp_rpc_wr(gsp, rpc, NVKM_GSP_RPC_REPLY_RECV);
 	if (ret)
 		return ret;
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/5] drm/nouveau/nvkm: introduce new GSP reply policy NVKM_GSP_RPC_REPLY_POLL
  2025-02-07 17:58 [PATCH 0/5] NVKM GSP RPC message handling policy Zhi Wang
  2025-02-07 17:58 ` [PATCH 1/5] drm/nouveau/nvkm: factor out r535_gsp_rpc_handle_reply() Zhi Wang
  2025-02-07 17:58 ` [PATCH 2/5] drm/nouveau/nvkm: factor out the current RPC command reply policies Zhi Wang
@ 2025-02-07 17:58 ` Zhi Wang
  2025-02-12  6:57   ` Alexandre Courbot
  2025-02-07 17:58 ` [PATCH 4/5] drm/nouveau/nvkm: use the new policy for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY Zhi Wang
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Zhi Wang @ 2025-02-07 17:58 UTC (permalink / raw)
  To: nouveau
  Cc: airlied, daniel, dakr, bskeggs, acurrid, cjia, smitra, ankita,
	aniketa, kwankhede, targupta, zhiw, zhiwang

There can be multiple cases of handling the GSP RPC messages, which
are the reply of GSP RPC commands according to the requirement of the
callers and the nature of the GSP RPC commands.

Some GSP RPC command needs a new reply policy: "caller don't care about
the message content but want to make sure a reply is received". To
support this case, a new reply policy is introduced.

Introduce new reply policy NVKM_GSP_RPC_REPLY_POLL, which waits for the
returned GSP message but discards it for the caller.

No functional change is intended.

Signed-off-by: Zhi Wang <zhiw@nvidia.com>
---
 drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h | 1 +
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
index c467e44cab47..bc16510261b8 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
@@ -33,6 +33,7 @@ typedef void (*nvkm_gsp_event_func)(struct nvkm_gsp_event *, void *repv, u32 rep
 
 enum {
 	NVKM_GSP_RPC_REPLY_RECV = 1,
+	NVKM_GSP_RPC_REPLY_POLL,
 };
 
 struct nvkm_gsp {
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index bc8eb9a3cb28..af2836cca38f 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -601,6 +601,9 @@ r535_gsp_rpc_handle_reply(struct nvkm_gsp *gsp, u32 fn, int reply,
 		else
 			repv = msg;
 		break;
+	case NVKM_GSP_RPC_REPLY_POLL:
+		repv = r535_gsp_msg_recv(gsp, fn, 0);
+		break;
 	default:
 		repv = ERR_PTR(-EINVAL);
 		break;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/5] drm/nouveau/nvkm: use the new policy for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY
  2025-02-07 17:58 [PATCH 0/5] NVKM GSP RPC message handling policy Zhi Wang
                   ` (2 preceding siblings ...)
  2025-02-07 17:58 ` [PATCH 3/5] drm/nouveau/nvkm: introduce new GSP reply policy NVKM_GSP_RPC_REPLY_POLL Zhi Wang
@ 2025-02-07 17:58 ` Zhi Wang
  2025-02-07 17:58 ` [PATCH 5/5] drm/nouveau/nvkm: introduce a kernel doc for GSP message handling Zhi Wang
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Zhi Wang @ 2025-02-07 17:58 UTC (permalink / raw)
  To: nouveau
  Cc: airlied, daniel, dakr, bskeggs, acurrid, cjia, smitra, ankita,
	aniketa, kwankhede, targupta, zhiw, zhiwang

NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY is a large GSP RPC command. The actual
required policy is NVKM_GSP_RPC_REPLY_POLL. This can be observed from the
dump of the GSP message queue. After the large GSP RPC command is issued,
GSP will write only an empty RPC header in the queue as the reply.

Without this change, the policy "receiving the entire message" is used
for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY. This causes the timeout of receiving
the returned GSP message in the suspend/resume path:

[   80.683646] r535_gsp_rpc_push() - 962: rpc->function 4 gsp_rpc_len 0 payload_size 2c630 max_payload_size ffb0
[   80.704222] r535_gsp_msg_recv() - 501: recv rpc->fn 4, rpc->length 20
[   81.014566] mlx5_core 0000:01:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
[   83.384132] mlx5_core 0000:01:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
[  103.784986] ------------[ cut here ]------------
[  103.789620] WARNING: CPU: 6 PID: 246 at drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:201 r535_gsp_msgq_wait+0x8c/0xa0 [nouveau]
[  103.801441]  libcxgbi(E) libcxgb(E) qla4xxx(E) iscsi_boot_sysfs(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) fuse(E)
[  103.903122] CPU: 6 UID: 0 PID: 246 Comm: kworker/u130:30 Tainted: G            E      6.14.0-rc1+ #1
[  103.912254] Tainted: [E]=UNSIGNED_MODULE
[  103.916193] Hardware name: ASRockRack 1U1G-MILAN/N/ROMED8-NL, BIOS L3.12E 09/06/2022
[  103.923940] Workqueue: async async_run_entry_fn
[  103.928486] RIP: 0010:r535_gsp_msgq_wait+0x8c/0xa0 [nouveau]
[  103.934372] Code: 00 00 49 8b 94 24 e8 08 00 00 8b 12 29 ea 01 d0 0f 43 c2 39 d8 72 c8 41 8b 55 00 85 d2 74 0b 5b 5d 41 5c 41 5d e9 cf 0c 43 e6 <0f> 0b b8 92 ff ff ff 5b 5d 41 5c 41 5d e9 bd 0c 43 e6 66 90 90 90
[  103.953140] RSP: 0018:ffffb81a40baf970 EFLAGS: 00010246
[  103.958381] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000009651f0
[  103.965524] RDX: 0000000000000000 RSI: 0000000055555554 RDI: ffffb81a40baf8c0
[  103.972663] RBP: 000000000000001a R08: 0000000000000001 R09: 0000000000000000
[  103.979805] R10: 0000000000000000 R11: ffff97f70e33424c R12: ffff97d848d40000
[  103.986948] R13: ffffb81a40baf9fc R14: ffff97d848d40000 R15: 0000000000000000
[  103.994090] FS:  0000000000000000(0000) GS:ffff97f70e300000(0000) knlGS:0000000000000000
[  104.002187] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  104.007947] CR2: 00000a8bdc3b8000 CR3: 000000016e738001 CR4: 0000000000770ef0
[  104.015087] PKRU: 55555554
[  104.017809] Call Trace:
[  104.020268]  <TASK>
[  104.022382]  ? __warn+0x84/0x130
[  104.025627]  ? r535_gsp_msgq_wait+0x8c/0xa0 [nouveau]
[  104.030889]  ? report_bug+0x18a/0x1a0
[  104.034561]  ? handle_bug+0x53/0x90
[  104.038061]  ? exc_invalid_op+0x14/0x70
[  104.041910]  ? asm_exc_invalid_op+0x16/0x20
[  104.046109]  ? r535_gsp_msgq_wait+0x8c/0xa0 [nouveau]
[  104.051351]  r535_gsp_msgq_recv+0x13c/0x1e0 [nouveau]
[  104.056588]  r535_gsp_msg_recv+0xa9/0x260 [nouveau]
[  104.061654]  r535_gsp_rpc_push+0x12c/0x1b0 [nouveau]
[  104.066805]  fbsr_memlist+0x13a/0x1c0 [nouveau]
[  104.071564]  r535_instmem_suspend+0x3e4/0x720 [nouveau]
[  104.076997]  ? srso_alias_return_thunk+0x5/0xfbef5
[  104.081807]  ? prb_read+0x6f/0x150
[  104.085225]  ? nvkm_instmem_fini+0x25/0x60 [nouveau]
[  104.090383]  nvkm_instmem_fini+0x25/0x60 [nouveau]
[  104.095371]  nvkm_subdev_fini+0x66/0x150 [nouveau]
[  104.100353]  ? down_write+0xe/0x60
[  104.103765]  nvkm_device_fini+0x94/0x1e0 [nouveau]
[  104.108808]  nvkm_udevice_fini+0x4f/0x70 [nouveau]
[  104.113831]  nvkm_object_fini+0xb8/0x240 [nouveau]
[  104.118814]  nvkm_object_fini+0x6e/0x240 [nouveau]
[  104.123788]  nouveau_do_suspend+0xf9/0x210 [nouveau]
[  104.128997]  nouveau_pmops_suspend+0x39/0x80 [nouveau]

Use the new policy NVKM_GSP_RPC_REPLY_POLL on the GSP RPC command
NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY.

Cc: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
---
 drivers/gpu/drm/nouveau/nvkm/subdev/instmem/r535.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/r535.c
index 2789efe9c100..35ba1798ee6e 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/r535.c
@@ -105,7 +105,7 @@ fbsr_memlist(struct nvkm_gsp_device *device, u32 handle, enum nvkm_memory_target
 			rpc->pteDesc.pte_pde[i].pte = (phys >> 12) + i;
 	}
 
-	ret = nvkm_gsp_rpc_wr(gsp, rpc, NVKM_GSP_RPC_REPLY_RECV);
+	ret = nvkm_gsp_rpc_wr(gsp, rpc, NVKM_GSP_RPC_REPLY_POLL);
 	if (ret)
 		return ret;
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/5] drm/nouveau/nvkm: introduce a kernel doc for GSP message handling
  2025-02-07 17:58 [PATCH 0/5] NVKM GSP RPC message handling policy Zhi Wang
                   ` (3 preceding siblings ...)
  2025-02-07 17:58 ` [PATCH 4/5] drm/nouveau/nvkm: use the new policy for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY Zhi Wang
@ 2025-02-07 17:58 ` Zhi Wang
  2025-02-12  6:59   ` Alexandre Courbot
  2025-02-12 13:36 ` [PATCH 0/5] NVKM GSP RPC message handling policy Danilo Krummrich
  2025-02-20 20:48 ` Ben Skeggs
  6 siblings, 1 reply; 15+ messages in thread
From: Zhi Wang @ 2025-02-07 17:58 UTC (permalink / raw)
  To: nouveau
  Cc: airlied, daniel, dakr, bskeggs, acurrid, cjia, smitra, ankita,
	aniketa, kwankhede, targupta, zhiw, zhiwang

Introduce a kernel doc to describe the GSP message handling policy.

Signed-off-by: Zhi Wang <zhiw@nvidia.com>
---
 Documentation/gpu/nouveau.rst                   |  3 +++
 .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h   | 17 +++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/Documentation/gpu/nouveau.rst b/Documentation/gpu/nouveau.rst
index 0f34131ccc27..b8c801e0068c 100644
--- a/Documentation/gpu/nouveau.rst
+++ b/Documentation/gpu/nouveau.rst
@@ -27,3 +27,6 @@ GSP Support
 
 .. kernel-doc:: drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
    :doc: GSP message queue element
+
+.. kernel-doc:: drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
+   :doc: GSP message handling policy
diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
index bc16510261b8..2d0b80a733d7 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
@@ -31,6 +31,23 @@ typedef int (*nvkm_gsp_msg_ntfy_func)(void *priv, u32 fn, void *repv, u32 repc);
 struct nvkm_gsp_event;
 typedef void (*nvkm_gsp_event_func)(struct nvkm_gsp_event *, void *repv, u32 repc);
 
+/**
+ * DOC: GSP message handling policy
+ *
+ * When sending a GSP RPC command, there can be multiple cases of handling
+ * the GSP RPC messages, which are the reply of GSP RPC commands, according
+ * to the requirement of the callers and the nature of the GSP RPC commands.
+ *
+ * If none is specified, the policy is the callers don't care. Immediately
+ * return to the caller after the GSP RPC command is issued.
+ *
+ * NVKM_GSP_RPC_REPLY_RECV - If specified, wait and receive the entire GSP
+ * RPC message after the GSP RPC command is issued.
+ *
+ * NVKM_GSP_RPC_REPLY_POLL - If specified, wait for the specific reply and
+ * discard the reply before returning to the caller.
+ *
+ */
 enum {
 	NVKM_GSP_RPC_REPLY_RECV = 1,
 	NVKM_GSP_RPC_REPLY_POLL,
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] drm/nouveau/nvkm: factor out r535_gsp_rpc_handle_reply()
  2025-02-07 17:58 ` [PATCH 1/5] drm/nouveau/nvkm: factor out r535_gsp_rpc_handle_reply() Zhi Wang
@ 2025-02-12  6:32   ` Alexandre Courbot
  0 siblings, 0 replies; 15+ messages in thread
From: Alexandre Courbot @ 2025-02-12  6:32 UTC (permalink / raw)
  To: Zhi Wang, nouveau
  Cc: airlied, daniel, dakr, bskeggs, acurrid, cjia, smitra, ankita,
	aniketa, kwankhede, targupta, zhiwang, Nouveau

Hi Zhi,

On Sat Feb 8, 2025 at 2:58 AM JST, Zhi Wang wrote:
> There can be multiple cases of handling the GSP RPC messages, which are
> the reply of GSP RPC commands according to the requirement of the
> callers and the nature of the GSP RPC commands.
>
> To support all cases, first, centralize the handling of the reply in a
> single function. Factor out r535_gsp_rpc_handle_reply().
>
> No functional change is intended for small GSP RPC commands. For large GSP
> commands, the caller decides the policy of how to handle the returned GSP
> RPC message.
>
> Signed-off-by: Zhi Wang <zhiw@nvidia.com>
> ---
>  .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 48 +++++++++----------
>  1 file changed, 24 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> index 2075cad63805..1ed7d5624a56 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> @@ -583,14 +583,30 @@ r535_gsp_rpc_poll(struct nvkm_gsp *gsp, u32 fn)
>  	return 0;
>  }
>  
> +static void *
> +r535_gsp_rpc_handle_reply(struct nvkm_gsp *gsp, u32 fn, bool wait,
> +			  u32 gsp_rpc_len)
> +{
> +	struct nvfw_gsp_rpc *msg;
> +	void *repv = NULL;
> +
> +	if (wait) {
> +		msg = r535_gsp_msg_recv(gsp, fn, gsp_rpc_len);
> +		if (!IS_ERR_OR_NULL(msg))
> +			repv = msg->data;
> +		else
> +			repv = msg;
> +	}
> +
> +	return repv;
> +}
> +
>  static void *
>  r535_gsp_rpc_send(struct nvkm_gsp *gsp, void *payload, bool wait,
>  		  u32 gsp_rpc_len)
>  {
>  	struct nvfw_gsp_rpc *rpc = to_gsp_hdr(payload, rpc);
> -	struct nvfw_gsp_rpc *msg;
> -	u32 fn = rpc->function;
> -	void *repv = NULL;
> +	u32 function = rpc->function;

Is it necessary to rename 'fn' here? You don't do it in
r535_gsp_rpc_push()...

>  	int ret;
>  
>  	if (gsp->subdev.debug >= NV_DBG_TRACE) {
> @@ -604,15 +620,7 @@ r535_gsp_rpc_send(struct nvkm_gsp *gsp, void *payload, bool wait,
>  	if (ret)
>  		return ERR_PTR(ret);
>  
> -	if (wait) {
> -		msg = r535_gsp_msg_recv(gsp, fn, gsp_rpc_len);
> -		if (!IS_ERR_OR_NULL(msg))
> -			repv = msg->data;
> -		else
> -			repv = msg;
> -	}
> -
> -	return repv;
> +	return r535_gsp_rpc_handle_reply(gsp, function, wait, gsp_rpc_len);
>  }
>  
>  static void
> @@ -996,18 +1004,10 @@ r535_gsp_rpc_push(struct nvkm_gsp *gsp, void *payload, bool wait,
>  		}
>  
>  		/* Wait for reply. */
> -		rpc = r535_gsp_msg_recv(gsp, fn, payload_size +
> -					sizeof(*rpc));
> -		if (!IS_ERR_OR_NULL(rpc)) {
> -			if (wait) {
> -				repv = rpc->data;
> -			} else {
> -				nvkm_gsp_rpc_done(gsp, rpc);
> -				repv = NULL;
> -			}
> -		} else {
> -			repv = wait ? rpc : NULL;
> -		}
> +		repv = r535_gsp_rpc_handle_reply(gsp, fn, wait, payload_size +
> +						 sizeof(*rpc));
> +		if (IS_ERR_OR_NULL(repv))
> +			repv = wait ? repv : NULL;

I'm not familiar with this code, so maybe that's nothing, but is it ok
to drop the call to nvkm_gsp_rpc_done() that was done in the original
code if wait == false?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] drm/nouveau/nvkm: factor out the current RPC command reply policies
  2025-02-07 17:58 ` [PATCH 2/5] drm/nouveau/nvkm: factor out the current RPC command reply policies Zhi Wang
@ 2025-02-12  6:55   ` Alexandre Courbot
  2025-02-12 19:56     ` Zhi Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Alexandre Courbot @ 2025-02-12  6:55 UTC (permalink / raw)
  To: Zhi Wang, nouveau
  Cc: airlied, daniel, dakr, bskeggs, acurrid, cjia, smitra, ankita,
	aniketa, kwankhede, targupta, zhiwang, Nouveau

On Sat Feb 8, 2025 at 2:58 AM JST, Zhi Wang wrote:
> There can be multiple cases of handling the GSP RPC messages, which are
> the reply of GSP RPC commands according to the requirement of the callers
> and the nature of the GSP RPC commands.
>
> The current supported reply policies are "callers don't care" or "receive
> the entire message" according to the requirement of the caller. For
> introducing a new policy, factor out the current RPC command reply
> polices.
>
> Factor out NVKM_GSP_RPC_REPLY_RECV as "receive the entire message". If
> none is specified, the default is "callers don't care".
>
> No functional change is intended.
>
> Signed-off-by: Zhi Wang <zhiw@nvidia.com>
> ---
>  .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h | 16 +++++---
>  .../gpu/drm/nouveau/nvkm/subdev/bar/r535.c    |  2 +-
>  .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 41 +++++++++++--------
>  .../drm/nouveau/nvkm/subdev/instmem/r535.c    |  2 +-
>  4 files changed, 36 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> index 746e126c3ecf..c467e44cab47 100644
> --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> @@ -31,6 +31,10 @@ typedef int (*nvkm_gsp_msg_ntfy_func)(void *priv, u32 fn, void *repv, u32 repc);
>  struct nvkm_gsp_event;
>  typedef void (*nvkm_gsp_event_func)(struct nvkm_gsp_event *, void *repv, u32 repc);
>  
> +enum {
> +	NVKM_GSP_RPC_REPLY_RECV = 1,
> +};

Let's turn this into a named type and add a variant for the 0 value, e.g.

enum nvkm_gsp_rpc_reply_type {
  NVKM_GSP_RPC_DONT_CARE = 0,
  NVKM_GSP_RPC_REPLY_RECV = 1,
}

and use this type instead of an integer in the client code. This will
make the compiler warn us if we try to pass an unexpected value.

(DONT_CARE needs to be defined to something less ambiguous, I left it
as-is because I don't really understand the intent behind this name :))

> +
>  struct nvkm_gsp {
>  	const struct nvkm_gsp_func *func;
>  	struct nvkm_subdev subdev;
> @@ -188,7 +192,7 @@ struct nvkm_gsp {
>  
>  	const struct nvkm_gsp_rm {
>  		void *(*rpc_get)(struct nvkm_gsp *, u32 fn, u32 argc);
> -		void *(*rpc_push)(struct nvkm_gsp *, void *argv, bool wait, u32 repc);
> +		void *(*rpc_push)(struct nvkm_gsp *gsp, void *argv, int reply, u32 repc);
>  		void (*rpc_done)(struct nvkm_gsp *gsp, void *repv);
>  
>  		void *(*rm_ctrl_get)(struct nvkm_gsp_object *, u32 cmd, u32 argc);
> @@ -255,9 +259,9 @@ nvkm_gsp_rpc_get(struct nvkm_gsp *gsp, u32 fn, u32 argc)
>  }
>  
>  static inline void *
> -nvkm_gsp_rpc_push(struct nvkm_gsp *gsp, void *argv, bool wait, u32 repc)
> +nvkm_gsp_rpc_push(struct nvkm_gsp *gsp, void *argv, int reply, u32 repc)
>  {
> -	return gsp->rm->rpc_push(gsp, argv, wait, repc);
> +	return gsp->rm->rpc_push(gsp, argv, reply, repc);
>  }
>  
>  static inline void *
> @@ -268,13 +272,13 @@ nvkm_gsp_rpc_rd(struct nvkm_gsp *gsp, u32 fn, u32 argc)
>  	if (IS_ERR_OR_NULL(argv))
>  		return argv;
>  
> -	return nvkm_gsp_rpc_push(gsp, argv, true, argc);
> +	return nvkm_gsp_rpc_push(gsp, argv, NVKM_GSP_RPC_REPLY_RECV, argc);
>  }
>  
>  static inline int
> -nvkm_gsp_rpc_wr(struct nvkm_gsp *gsp, void *argv, bool wait)
> +nvkm_gsp_rpc_wr(struct nvkm_gsp *gsp, void *argv, int reply)
>  {
> -	void *repv = nvkm_gsp_rpc_push(gsp, argv, wait, 0);
> +	void *repv = nvkm_gsp_rpc_push(gsp, argv, reply, 0);
>  
>  	if (IS_ERR(repv))
>  		return PTR_ERR(repv);
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/r535.c
> index 3a30bea30e36..90186f98065c 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/r535.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/r535.c
> @@ -56,7 +56,7 @@ r535_bar_bar2_update_pde(struct nvkm_gsp *gsp, u64 addr)
>  	rpc->info.entryValue = addr ? ((addr >> 4) | 2) : 0; /* PD3 entry format! */
>  	rpc->info.entryLevelShift = 47; //XXX: probably fetch this from mmu!
>  
> -	return nvkm_gsp_rpc_wr(gsp, rpc, true);
> +	return nvkm_gsp_rpc_wr(gsp, rpc, NVKM_GSP_RPC_REPLY_RECV);
>  }
>  
>  static void
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> index 1ed7d5624a56..bc8eb9a3cb28 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> @@ -584,25 +584,32 @@ r535_gsp_rpc_poll(struct nvkm_gsp *gsp, u32 fn)
>  }
>  
>  static void *
> -r535_gsp_rpc_handle_reply(struct nvkm_gsp *gsp, u32 fn, bool wait,
> +r535_gsp_rpc_handle_reply(struct nvkm_gsp *gsp, u32 fn, int reply,
>  			  u32 gsp_rpc_len)

So here 'int reply' would become 'enum nvkm_gsp_rpc_reply_type reply'
(and propagate to other callers).

>  {
>  	struct nvfw_gsp_rpc *msg;
>  	void *repv = NULL;
>  
> -	if (wait) {
> +	if (!reply)
> +		return NULL;
> +
> +	switch (reply) {
> +	case NVKM_GSP_RPC_REPLY_RECV:
>  		msg = r535_gsp_msg_recv(gsp, fn, gsp_rpc_len);
>  		if (!IS_ERR_OR_NULL(msg))
>  			repv = msg->data;
>  		else
>  			repv = msg;
> +		break;
> +	default:
> +		repv = ERR_PTR(-EINVAL);
> +		break;
>  	}

With the introduced type, this would become:

switch (reply) {
  case NVKM_GSP_RPC_DONT_CARE:
    /* Works as repv is NULL already */
    break;
  case NVKM_GSP_RPC_REPLY_RECV:
    msg = r535_gsp_msg_recv(gsp, fn, gsp_rpc_len);
    if (!IS_ERR_OR_NULL(msg))
      repv = msg->data;
    else
      repv = msg;
    break;
}

I'm not sure whether you still need a 'default' arm? The compiler is
happy without it, and since you control all the call sites nothing funny
can happen without a dirty cast.

> -

No need to remove this separator line IMHO.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] drm/nouveau/nvkm: introduce new GSP reply policy NVKM_GSP_RPC_REPLY_POLL
  2025-02-07 17:58 ` [PATCH 3/5] drm/nouveau/nvkm: introduce new GSP reply policy NVKM_GSP_RPC_REPLY_POLL Zhi Wang
@ 2025-02-12  6:57   ` Alexandre Courbot
  2025-02-12 13:33     ` Danilo Krummrich
  0 siblings, 1 reply; 15+ messages in thread
From: Alexandre Courbot @ 2025-02-12  6:57 UTC (permalink / raw)
  To: Zhi Wang, nouveau
  Cc: airlied, daniel, dakr, bskeggs, acurrid, cjia, smitra, ankita,
	aniketa, kwankhede, targupta, zhiwang, Nouveau

On Sat Feb 8, 2025 at 2:58 AM JST, Zhi Wang wrote:
> There can be multiple cases of handling the GSP RPC messages, which
> are the reply of GSP RPC commands according to the requirement of the
> callers and the nature of the GSP RPC commands.
>
> Some GSP RPC command needs a new reply policy: "caller don't care about
> the message content but want to make sure a reply is received". To
> support this case, a new reply policy is introduced.
>
> Introduce new reply policy NVKM_GSP_RPC_REPLY_POLL, which waits for the
> returned GSP message but discards it for the caller.
>
> No functional change is intended.
>
> Signed-off-by: Zhi Wang <zhiw@nvidia.com>
> ---
>  drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h | 1 +
>  drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 3 +++
>  2 files changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> index c467e44cab47..bc16510261b8 100644
> --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> @@ -33,6 +33,7 @@ typedef void (*nvkm_gsp_event_func)(struct nvkm_gsp_event *, void *repv, u32 rep
>  
>  enum {
>  	NVKM_GSP_RPC_REPLY_RECV = 1,
> +	NVKM_GSP_RPC_REPLY_POLL,
>  };
>  
>  struct nvkm_gsp {
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> index bc8eb9a3cb28..af2836cca38f 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> @@ -601,6 +601,9 @@ r535_gsp_rpc_handle_reply(struct nvkm_gsp *gsp, u32 fn, int reply,
>  		else
>  			repv = msg;
>  		break;
> +	case NVKM_GSP_RPC_REPLY_POLL:
> +		repv = r535_gsp_msg_recv(gsp, fn, 0);
> +		break;
>  	default:
>  		repv = ERR_PTR(-EINVAL);
>  		break;

I suspect patch 4 can be merged into this one, so we introduce the user
at the same time as the functionality.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/5] drm/nouveau/nvkm: introduce a kernel doc for GSP message handling
  2025-02-07 17:58 ` [PATCH 5/5] drm/nouveau/nvkm: introduce a kernel doc for GSP message handling Zhi Wang
@ 2025-02-12  6:59   ` Alexandre Courbot
  0 siblings, 0 replies; 15+ messages in thread
From: Alexandre Courbot @ 2025-02-12  6:59 UTC (permalink / raw)
  To: Zhi Wang, nouveau
  Cc: airlied, daniel, dakr, bskeggs, acurrid, cjia, smitra, ankita,
	aniketa, kwankhede, targupta, zhiwang, Nouveau

On Sat Feb 8, 2025 at 2:58 AM JST, Zhi Wang wrote:
> Introduce a kernel doc to describe the GSP message handling policy.
>
> Signed-off-by: Zhi Wang <zhiw@nvidia.com>
> ---
>  Documentation/gpu/nouveau.rst                   |  3 +++
>  .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h   | 17 +++++++++++++++++
>  2 files changed, 20 insertions(+)
>
> diff --git a/Documentation/gpu/nouveau.rst b/Documentation/gpu/nouveau.rst
> index 0f34131ccc27..b8c801e0068c 100644
> --- a/Documentation/gpu/nouveau.rst
> +++ b/Documentation/gpu/nouveau.rst
> @@ -27,3 +27,6 @@ GSP Support
>  
>  .. kernel-doc:: drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
>     :doc: GSP message queue element
> +
> +.. kernel-doc:: drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> +   :doc: GSP message handling policy
> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> index bc16510261b8..2d0b80a733d7 100644
> --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> @@ -31,6 +31,23 @@ typedef int (*nvkm_gsp_msg_ntfy_func)(void *priv, u32 fn, void *repv, u32 repc);
>  struct nvkm_gsp_event;
>  typedef void (*nvkm_gsp_event_func)(struct nvkm_gsp_event *, void *repv, u32 repc);
>  
> +/**
> + * DOC: GSP message handling policy
> + *
> + * When sending a GSP RPC command, there can be multiple cases of handling
> + * the GSP RPC messages, which are the reply of GSP RPC commands, according
> + * to the requirement of the callers and the nature of the GSP RPC commands.
> + *
> + * If none is specified, the policy is the callers don't care. Immediately
> + * return to the caller after the GSP RPC command is issued.
> + *
> + * NVKM_GSP_RPC_REPLY_RECV - If specified, wait and receive the entire GSP
> + * RPC message after the GSP RPC command is issued.
> + *
> + * NVKM_GSP_RPC_REPLY_POLL - If specified, wait for the specific reply and
> + * discard the reply before returning to the caller.
> + *
> + */
>  enum {
>  	NVKM_GSP_RPC_REPLY_RECV = 1,
>  	NVKM_GSP_RPC_REPLY_POLL,

It would IMHO look more natural if the documentation was introduced at
the same time as the variants themselves, so they are not left
undocumented for some range of the git history.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] drm/nouveau/nvkm: introduce new GSP reply policy NVKM_GSP_RPC_REPLY_POLL
  2025-02-12  6:57   ` Alexandre Courbot
@ 2025-02-12 13:33     ` Danilo Krummrich
  2025-02-12 19:59       ` Zhi Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Danilo Krummrich @ 2025-02-12 13:33 UTC (permalink / raw)
  To: Alexandre Courbot, Zhi Wang
  Cc: nouveau, airlied, daniel, bskeggs, acurrid, cjia, smitra, ankita,
	aniketa, kwankhede, targupta, zhiwang, Nouveau

On Wed, Feb 12, 2025 at 03:57:38PM +0900, Alexandre Courbot wrote:
> On Sat Feb 8, 2025 at 2:58 AM JST, Zhi Wang wrote:
> > There can be multiple cases of handling the GSP RPC messages, which
> > are the reply of GSP RPC commands according to the requirement of the
> > callers and the nature of the GSP RPC commands.
> >
> > Some GSP RPC command needs a new reply policy: "caller don't care about
> > the message content but want to make sure a reply is received". To
> > support this case, a new reply policy is introduced.
> >
> > Introduce new reply policy NVKM_GSP_RPC_REPLY_POLL, which waits for the
> > returned GSP message but discards it for the caller.
> >
> > No functional change is intended.
> >
> > Signed-off-by: Zhi Wang <zhiw@nvidia.com>
> > ---
> >  drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h | 1 +
> >  drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 3 +++
> >  2 files changed, 4 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> > index c467e44cab47..bc16510261b8 100644
> > --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> > +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> > @@ -33,6 +33,7 @@ typedef void (*nvkm_gsp_event_func)(struct nvkm_gsp_event *, void *repv, u32 rep
> >  
> >  enum {
> >  	NVKM_GSP_RPC_REPLY_RECV = 1,
> > +	NVKM_GSP_RPC_REPLY_POLL,
> >  };
> >  
> >  struct nvkm_gsp {
> > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> > index bc8eb9a3cb28..af2836cca38f 100644
> > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> > @@ -601,6 +601,9 @@ r535_gsp_rpc_handle_reply(struct nvkm_gsp *gsp, u32 fn, int reply,
> >  		else
> >  			repv = msg;
> >  		break;
> > +	case NVKM_GSP_RPC_REPLY_POLL:
> > +		repv = r535_gsp_msg_recv(gsp, fn, 0);
> > +		break;
> >  	default:
> >  		repv = ERR_PTR(-EINVAL);
> >  		break;
> 
> I suspect patch 4 can be merged into this one, so we introduce the user
> at the same time as the functionality.

I agree, please merge this one with patch 4 and add the corresponding 'Fixes'
tag.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] NVKM GSP RPC message handling policy
  2025-02-07 17:58 [PATCH 0/5] NVKM GSP RPC message handling policy Zhi Wang
                   ` (4 preceding siblings ...)
  2025-02-07 17:58 ` [PATCH 5/5] drm/nouveau/nvkm: introduce a kernel doc for GSP message handling Zhi Wang
@ 2025-02-12 13:36 ` Danilo Krummrich
  2025-02-20 20:48 ` Ben Skeggs
  6 siblings, 0 replies; 15+ messages in thread
From: Danilo Krummrich @ 2025-02-12 13:36 UTC (permalink / raw)
  To: Zhi Wang
  Cc: nouveau, airlied, daniel, bskeggs, acurrid, cjia, smitra, ankita,
	aniketa, kwankhede, targupta, zhiwang

For future submissions, please make sure to also add the other nouveau
maintainers to keep them in the loop.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] drm/nouveau/nvkm: factor out the current RPC command reply policies
  2025-02-12  6:55   ` Alexandre Courbot
@ 2025-02-12 19:56     ` Zhi Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Zhi Wang @ 2025-02-12 19:56 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: nouveau, airlied, daniel, dakr, bskeggs, acurrid, cjia, smitra,
	ankita, aniketa, kwankhede, targupta, zhiwang, Nouveau

On Wed, 12 Feb 2025 15:55:08 +0900
"Alexandre Courbot" <acourbot@nvidia.com> wrote:

> On Sat Feb 8, 2025 at 2:58 AM JST, Zhi Wang wrote:
> > There can be multiple cases of handling the GSP RPC messages, which are
> > the reply of GSP RPC commands according to the requirement of the callers
> > and the nature of the GSP RPC commands.
> >
> > The current supported reply policies are "callers don't care" or "receive
> > the entire message" according to the requirement of the caller. For
> > introducing a new policy, factor out the current RPC command reply
> > polices.
> >
> > Factor out NVKM_GSP_RPC_REPLY_RECV as "receive the entire message". If
> > none is specified, the default is "callers don't care".
> >
> > No functional change is intended.
> >
> > Signed-off-by: Zhi Wang <zhiw@nvidia.com>
> > ---
> >  .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h | 16 +++++---
> >  .../gpu/drm/nouveau/nvkm/subdev/bar/r535.c    |  2 +-
> >  .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 41 +++++++++++--------
> >  .../drm/nouveau/nvkm/subdev/instmem/r535.c    |  2 +-
> >  4 files changed, 36 insertions(+), 25 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> > index 746e126c3ecf..c467e44cab47 100644
> > --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> > +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> > @@ -31,6 +31,10 @@ typedef int (*nvkm_gsp_msg_ntfy_func)(void *priv, u32 fn, void *repv, u32 repc);
> >  struct nvkm_gsp_event;
> >  typedef void (*nvkm_gsp_event_func)(struct nvkm_gsp_event *, void *repv, u32 repc);
> >  
> > +enum {
> > +	NVKM_GSP_RPC_REPLY_RECV = 1,
> > +};  
> 
> Let's turn this into a named type and add a variant for the 0 value, e.g.
> 
> enum nvkm_gsp_rpc_reply_type {
>   NVKM_GSP_RPC_DONT_CARE = 0,
>   NVKM_GSP_RPC_REPLY_RECV = 1,
> }
> 
> and use this type instead of an integer in the client code. This will
> make the compiler warn us if we try to pass an unexpected value.
> 
> (DONT_CARE needs to be defined to something less ambiguous, I left it
> as-is because I don't really understand the intent behind this name :))
> 

Thanks for the idea.

I was struggling to come up with a perfect name as well. DONT_CARE was on
the list, but it seems so heavy when considering that DONT_CARE is the
most common case and not looking ideal when it spreads to the call sites. :)

Probably I will go with NOWAIT.

> > +
> >  struct nvkm_gsp {
> >  	const struct nvkm_gsp_func *func;
> >  	struct nvkm_subdev subdev;
> > @@ -188,7 +192,7 @@ struct nvkm_gsp {
> >  
> >  	const struct nvkm_gsp_rm {
> >  		void *(*rpc_get)(struct nvkm_gsp *, u32 fn, u32 argc);
> > -		void *(*rpc_push)(struct nvkm_gsp *, void *argv, bool wait, u32 repc);
> > +		void *(*rpc_push)(struct nvkm_gsp *gsp, void *argv, int reply, u32 repc);
> >  		void (*rpc_done)(struct nvkm_gsp *gsp, void *repv);
> >  
> >  		void *(*rm_ctrl_get)(struct nvkm_gsp_object *, u32 cmd, u32 argc);
> > @@ -255,9 +259,9 @@ nvkm_gsp_rpc_get(struct nvkm_gsp *gsp, u32 fn, u32 argc)
> >  }
> >  
> >  static inline void *
> > -nvkm_gsp_rpc_push(struct nvkm_gsp *gsp, void *argv, bool wait, u32 repc)
> > +nvkm_gsp_rpc_push(struct nvkm_gsp *gsp, void *argv, int reply, u32 repc)
> >  {
> > -	return gsp->rm->rpc_push(gsp, argv, wait, repc);
> > +	return gsp->rm->rpc_push(gsp, argv, reply, repc);
> >  }
> >  
> >  static inline void *
> > @@ -268,13 +272,13 @@ nvkm_gsp_rpc_rd(struct nvkm_gsp *gsp, u32 fn, u32 argc)
> >  	if (IS_ERR_OR_NULL(argv))
> >  		return argv;
> >  
> > -	return nvkm_gsp_rpc_push(gsp, argv, true, argc);
> > +	return nvkm_gsp_rpc_push(gsp, argv, NVKM_GSP_RPC_REPLY_RECV, argc);
> >  }
> >  
> >  static inline int
> > -nvkm_gsp_rpc_wr(struct nvkm_gsp *gsp, void *argv, bool wait)
> > +nvkm_gsp_rpc_wr(struct nvkm_gsp *gsp, void *argv, int reply)
> >  {
> > -	void *repv = nvkm_gsp_rpc_push(gsp, argv, wait, 0);
> > +	void *repv = nvkm_gsp_rpc_push(gsp, argv, reply, 0);
> >  
> >  	if (IS_ERR(repv))
> >  		return PTR_ERR(repv);
> > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/r535.c
> > index 3a30bea30e36..90186f98065c 100644
> > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/r535.c
> > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/r535.c
> > @@ -56,7 +56,7 @@ r535_bar_bar2_update_pde(struct nvkm_gsp *gsp, u64 addr)
> >  	rpc->info.entryValue = addr ? ((addr >> 4) | 2) : 0; /* PD3 entry format! */
> >  	rpc->info.entryLevelShift = 47; //XXX: probably fetch this from mmu!
> >  
> > -	return nvkm_gsp_rpc_wr(gsp, rpc, true);
> > +	return nvkm_gsp_rpc_wr(gsp, rpc, NVKM_GSP_RPC_REPLY_RECV);
> >  }
> >  
> >  static void
> > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> > index 1ed7d5624a56..bc8eb9a3cb28 100644
> > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> > @@ -584,25 +584,32 @@ r535_gsp_rpc_poll(struct nvkm_gsp *gsp, u32 fn)
> >  }
> >  
> >  static void *
> > -r535_gsp_rpc_handle_reply(struct nvkm_gsp *gsp, u32 fn, bool wait,
> > +r535_gsp_rpc_handle_reply(struct nvkm_gsp *gsp, u32 fn, int reply,
> >  			  u32 gsp_rpc_len)  
> 
> So here 'int reply' would become 'enum nvkm_gsp_rpc_reply_type reply'
> (and propagate to other callers).
> 
> >  {
> >  	struct nvfw_gsp_rpc *msg;
> >  	void *repv = NULL;
> >  
> > -	if (wait) {
> > +	if (!reply)
> > +		return NULL;
> > +
> > +	switch (reply) {
> > +	case NVKM_GSP_RPC_REPLY_RECV:
> >  		msg = r535_gsp_msg_recv(gsp, fn, gsp_rpc_len);
> >  		if (!IS_ERR_OR_NULL(msg))
> >  			repv = msg->data;
> >  		else
> >  			repv = msg;
> > +		break;
> > +	default:
> > +		repv = ERR_PTR(-EINVAL);
> > +		break;
> >  	}  
> 
> With the introduced type, this would become:
> 
> switch (reply) {
>   case NVKM_GSP_RPC_DONT_CARE:
>     /* Works as repv is NULL already */
>     break;
>   case NVKM_GSP_RPC_REPLY_RECV:
>     msg = r535_gsp_msg_recv(gsp, fn, gsp_rpc_len);
>     if (!IS_ERR_OR_NULL(msg))
>       repv = msg->data;
>     else
>       repv = msg;
>     break;
> }
> 
> I'm not sure whether you still need a 'default' arm? The compiler is
> happy without it, and since you control all the call sites nothing funny
> can happen without a dirty cast.
> 
> > -  
> 
> No need to remove this separator line IMHO.
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] drm/nouveau/nvkm: introduce new GSP reply policy NVKM_GSP_RPC_REPLY_POLL
  2025-02-12 13:33     ` Danilo Krummrich
@ 2025-02-12 19:59       ` Zhi Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Zhi Wang @ 2025-02-12 19:59 UTC (permalink / raw)
  To: Danilo Krummrich, Alexandre Courbot
  Cc: nouveau@lists.freedesktop.org, airlied@gmail.com, daniel@ffwll.ch,
	Ben Skeggs, Andy Currid, Neo Jia, Surath Mitra, Ankit Agrawal,
	Aniket Agashe, Kirti Wankhede, Tarun Gupta (SW-GPU),
	zhiwang@kernel.org, Nouveau

On 12/02/2025 15.33, Danilo Krummrich wrote:
> On Wed, Feb 12, 2025 at 03:57:38PM +0900, Alexandre Courbot wrote:
>> On Sat Feb 8, 2025 at 2:58 AM JST, Zhi Wang wrote:
>>> There can be multiple cases of handling the GSP RPC messages, which
>>> are the reply of GSP RPC commands according to the requirement of the
>>> callers and the nature of the GSP RPC commands.
>>>
>>> Some GSP RPC command needs a new reply policy: "caller don't care about
>>> the message content but want to make sure a reply is received". To
>>> support this case, a new reply policy is introduced.
>>>
>>> Introduce new reply policy NVKM_GSP_RPC_REPLY_POLL, which waits for the
>>> returned GSP message but discards it for the caller.
>>>
...
> I agree, please merge this one with patch 4 and add the corresponding 'Fixes'
> tag.

Thanks for the comment. I saw the previous patch is still on the 
-misc-next, so I was not sure if Fixes: is need here. Will add it in the 
re-spin.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] NVKM GSP RPC message handling policy
  2025-02-07 17:58 [PATCH 0/5] NVKM GSP RPC message handling policy Zhi Wang
                   ` (5 preceding siblings ...)
  2025-02-12 13:36 ` [PATCH 0/5] NVKM GSP RPC message handling policy Danilo Krummrich
@ 2025-02-20 20:48 ` Ben Skeggs
  6 siblings, 0 replies; 15+ messages in thread
From: Ben Skeggs @ 2025-02-20 20:48 UTC (permalink / raw)
  To: Zhi Wang, nouveau
  Cc: airlied, daniel, dakr, acurrid, cjia, smitra, ankita, aniketa,
	kwankhede, targupta, zhiwang

On 8/2/25 03:58, Zhi Wang wrote:

> Ben reported an issue that the patch [1] breaks the suspend/resume.
>
> After digging for a while, I noticed that this problem had been there
> before introducing that patch, but not exposed because r535_gsp_rpc_push()
> doesn't repsect the caller's requirement when handling the large RPC
> command: It won't wait for the reply even the caller requires. (Small
> RPCs are fine.)
>
> After that patch series is introduced, r535_gsp_rpc_push() really waits
> for the reply and receives the entire GSP message, which is required
> by the large vGPU RPC command.
>
> There are currently two GSP RPC message handling policy:
>
> - a. dont care. discard the message before returning to the caller.
> - b. receive the entire message. wait and receive the entire message before
>    returning to the caller.
>
> On the path of suspend/resume, there is a large GSP command
> NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY, which returns only a GSP RPC message
> header to tell the driver that the request is handled. The policy in the
> driver is to receive the entrie message, which ends up with a timeout
> and error when r535_gsp_rpc_push() tries to receive the message. That
> breaks the suspend/resume path.
>
> This series factors out the current GSP RPC message handling policy and
> introduces a new policy for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY and a
> kernel doc to illustrate the policies.
>
> With this patchset, the problem can't be reproduced and suspend/resume
> works on my L40.

This seems to fix the issue here on top of current drm-misc-next.

Tested-by: Ben Skeggs <bskeggs@nvidia.com>

>
> [1] https://lore.kernel.org/nouveau/7eb31f1f-fc3a-4fb5-86cf-4bd011d68ff1@nvidia.com/T/#t
>
> Zhi Wang (5):
>    drm/nouveau/nvkm: factor out r535_gsp_rpc_handle_reply()
>    drm/nouveau/nvkm: factor out the current RPC command reply policies
>    drm/nouveau/nvkm: introduce new GSP reply policy
>      NVKM_GSP_RPC_REPLY_POLL
>    drm/nouveau/nvkm: use the new policy for
>      NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY
>    drm/nouveau/nvkm: introduce a kernel doc for GSP message handling
>
>   Documentation/gpu/nouveau.rst                 |  3 +
>   .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h | 34 ++++++--
>   .../gpu/drm/nouveau/nvkm/subdev/bar/r535.c    |  2 +-
>   .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 80 +++++++++++--------
>   .../drm/nouveau/nvkm/subdev/instmem/r535.c    |  2 +-
>   5 files changed, 78 insertions(+), 43 deletions(-)
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-02-20 20:48 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-07 17:58 [PATCH 0/5] NVKM GSP RPC message handling policy Zhi Wang
2025-02-07 17:58 ` [PATCH 1/5] drm/nouveau/nvkm: factor out r535_gsp_rpc_handle_reply() Zhi Wang
2025-02-12  6:32   ` Alexandre Courbot
2025-02-07 17:58 ` [PATCH 2/5] drm/nouveau/nvkm: factor out the current RPC command reply policies Zhi Wang
2025-02-12  6:55   ` Alexandre Courbot
2025-02-12 19:56     ` Zhi Wang
2025-02-07 17:58 ` [PATCH 3/5] drm/nouveau/nvkm: introduce new GSP reply policy NVKM_GSP_RPC_REPLY_POLL Zhi Wang
2025-02-12  6:57   ` Alexandre Courbot
2025-02-12 13:33     ` Danilo Krummrich
2025-02-12 19:59       ` Zhi Wang
2025-02-07 17:58 ` [PATCH 4/5] drm/nouveau/nvkm: use the new policy for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY Zhi Wang
2025-02-07 17:58 ` [PATCH 5/5] drm/nouveau/nvkm: introduce a kernel doc for GSP message handling Zhi Wang
2025-02-12  6:59   ` Alexandre Courbot
2025-02-12 13:36 ` [PATCH 0/5] NVKM GSP RPC message handling policy Danilo Krummrich
2025-02-20 20:48 ` Ben Skeggs

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.