Re: [PATCH 1/3] drm/i915/guc: Add non blocking CTB send function

From: Matthew Brost <matthew.brost@intel.com>
To: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Intel-GFX@lists.freedesktop.org
Subject: Re: [PATCH 1/3] drm/i915/guc: Add non blocking CTB send function
Date: Thu, 21 Nov 2019 17:34:22 -0800	[thread overview]
Message-ID: <20191122013422.GB64720@sdutt> (raw)
In-Reply-To: <20191122001325.GA92577@sdutt>

On Thu, Nov 21, 2019 at 04:13:25PM -0800, Matthew Brost wrote:
>On Thu, Nov 21, 2019 at 12:43:26PM +0100, Michal Wajdeczko wrote:
>>On Thu, 21 Nov 2019 00:56:02 +0100, <John.C.Harrison@intel.com> wrote:
>>
>>>From: Matthew Brost <matthew.brost@intel.com>
>>>
>>>Add non blocking CTB send fuction, intel_guc_send_nb. In order to
>>>support a non blocking CTB send fuction a spin lock is needed to
>>
>>2x typos
>>
>>>protect the CTB descriptors fields. Also the non blocking call must not
>>>update the fence value as this value is owned by the blocking call
>>>(intel_guc_send).
>>
>>you probably mean "intel_guc_send_ct", as intel_guc_send is just a wrapper
>>around guc->send
>>
>
>Ah, yes.
>
>>>
>>>The blocking CTB now must have a flow control mechanism to ensure the
>>>buffer isn't overrun. A lazy spin wait is used as we believe the flow
>>>control condition should be rare with properly sized buffer. A retry
>>>counter is also implemented which fails H2G CTBs once a limit is
>>>reached to prevent deadlock.
>>>
>>>The function, intel_guc_send_nb, is exported in this patch but unused.
>>>Several patches later in the series make use of this function.
>>
>>It's likely in yet another series
>>
>
>Yes, it is.
>
>>>
>>>Cc: John Harrison <john.c.harrison@intel.com>
>>>Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>---
>>>drivers/gpu/drm/i915/gt/uc/intel_guc.h    |  2 +
>>>drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 97 +++++++++++++++++++----
>>>drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 10 ++-
>>>3 files changed, 91 insertions(+), 18 deletions(-)
>>>
>>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
>>>b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>index e6400204a2bd..77c5af919ace 100644
>>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>@@ -94,6 +94,8 @@ intel_guc_send_and_receive(struct intel_guc 
>>>*guc, const u32 *action, u32 len,
>>>	return guc->send(guc, action, len, response_buf, response_buf_size);
>>>}
>>>+int intel_guc_send_nb(struct intel_guc_ct *ct, const u32 *action, 
>>>u32 len);
>>>+
>>
>>Hmm, this mismatch of guc/ct parameter breaks the our layering.
>>But we can keep this layering intact by introducing some flags to
>>the existing guc_send() function. These flags could be passed as
>>high bits in action[0], like this:
>
>This seems reasonable.
>

Prototyped this and I don't like it all. First 'action' is a const so what you
are suggesting doesn't work unless that is changed. Also what if all bits in DW
eventually mean something, to me overloading a field isn't a good idea if
anything we should add another argument to guc->send(). But I'd honestly prefer
we just leave it as is. Non-blocking only applies to CTs (not MMIO) and we have
GEM_BUG_ON to protect us if this function is called incorrectly. Doing what you
suggest just makes everything more complicated IMO.

Matt

>>
>>#define GUC_ACTION_FLAG_DONT_WAIT 0x80000000
>>
>>int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset)
>>{
>>	u32 action[] = {
>>		INTEL_GUC_ACTION_AUTHENTICATE_HUC | GUC_ACTION_FLAG_DONT_WAIT,
>>		rsa_offset
>>	};
>>
>>	return intel_guc_send(guc, action, ARRAY_SIZE(action));
>>}
>>
>>then actual back-end of guc->send can take proper steps based on this flag:
>>
>>@@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action, 
>>u32 len,
>>       GEM_BUG_ON(!len);
>>       GEM_BUG_ON(len > guc->send_regs.count);
>>
>>+       if (*action & GUC_ACTION_FLAG_DONT_WAIT)
>>+               return -EINVAL;
>>+       *action &= ~GUC_ACTION_FLAG_DONT_WAIT;
>>+
>>       /* We expect only action code */
>>       GEM_BUG_ON(*action & ~INTEL_GUC_MSG_CODE_MASK);
>>
>>@@ @@ int intel_guc_send_ct(struct intel_guc *guc, const u32 
>>*action, u32 len,
>>       u32 status = ~0; /* undefined */
>>       int ret;
>>
>>+       if (*action & GUC_ACTION_FLAG_DONT_WAIT) {
>>+               GEM_BUG_ON(response_buf);
>>+               GEM_BUG_ON(response_buf_size);
>>+               return ctch_send_nb(ct, ctch, action, len);
>>+       }
>>+
>>       mutex_lock(&guc->send_mutex);
>>
>>       ret = ctch_send(ct, ctch, action, len, response_buf, 
>>response_buf_size,
>>
>>
>>>static inline void intel_guc_notify(struct intel_guc *guc)
>>>{
>>>	guc->notify(guc);
>>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
>>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>index b49115517510..e50d968b15d5 100644
>>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>@@ -3,6 +3,8 @@
>>> * Copyright © 2016-2019 Intel Corporation
>>> */
>>>+#include <linux/circ_buf.h>
>>>+
>>>#include "i915_drv.h"
>>>#include "intel_guc_ct.h"
>>>@@ -12,6 +14,8 @@
>>>#define CT_DEBUG_DRIVER(...)	do { } while (0)
>>>#endif
>>>+#define MAX_RETRY		0x1000000
>>>+
>>>struct ct_request {
>>>	struct list_head link;
>>>	u32 fence;
>>>@@ -40,7 +44,8 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>>>	/* we're using static channel owners */
>>>	ct->host_channel.owner = CTB_OWNER_HOST;
>>>-	spin_lock_init(&ct->lock);
>>>+	spin_lock_init(&ct->request_lock);
>>>+	spin_lock_init(&ct->send_lock);
>>>	INIT_LIST_HEAD(&ct->pending_requests);
>>>	INIT_LIST_HEAD(&ct->incoming_requests);
>>>	INIT_WORK(&ct->worker, ct_incoming_request_worker_func);
>>>@@ -291,7 +296,8 @@ static u32 ctch_get_next_fence(struct 
>>>intel_guc_ct_channel *ctch)
>>>static int ctb_write(struct intel_guc_ct_buffer *ctb,
>>>		     const u32 *action,
>>>		     u32 len /* in dwords */,
>>>-		     u32 fence,
>>>+		     u32 fence_value,
>>>+		     bool enable_fence,
>>
>>maybe we can just guarantee that fence=0 will never be used as a valid
>>fence id, then this flag could be replaced with (fence != 0) check.
>>
>
>Yes, again seems reasonable. Initialize next_fence = 1, then increment by 2 each
>time and this works.
>
>>>		     bool want_response)
>>>{
>>>	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>@@ -328,18 +334,18 @@ static int ctb_write(struct 
>>>intel_guc_ct_buffer *ctb,
>>>	 * DW2+: action data
>>>	 */
>>>	header = (len << GUC_CT_MSG_LEN_SHIFT) |
>>>-		 (GUC_CT_MSG_WRITE_FENCE_TO_DESC) |
>>>+		 (enable_fence ? GUC_CT_MSG_WRITE_FENCE_TO_DESC : 0) |
>>
>>Hmm, even if we ask fw to do not write back fence to the descriptor,
>>IIRC current firmware will unconditionally write back return status
>>of this non-blocking call, possibly overwriting status of the blocked
>>call.
>>
>
>Yes, known problem with the interface that needs to be fixed.
>
>>>		 (want_response ? GUC_CT_MSG_SEND_STATUS : 0) |
>>
>>btw, if we switch all requests to expect reply send back over CTB,
>>then we can possibly drop the send_mutex in CTB paths, and block
>>only when there is no DONT_WAIT flag and we have to wait for response.
>>
>
>Rather just wait for the GuC to fix this.
>
>>>		 (action[0] << GUC_CT_MSG_ACTION_SHIFT);
>>>	CT_DEBUG_DRIVER("CT: writing %*ph %*ph %*ph\n",
>>>-			4, &header, 4, &fence,
>>>+			4, &header, 4, &fence_value,
>>>			4 * (len - 1), &action[1]);
>>>	cmds[tail] = header;
>>>	tail = (tail + 1) % size;
>>>-	cmds[tail] = fence;
>>>+	cmds[tail] = fence_value;
>>>	tail = (tail + 1) % size;
>>>	for (i = 1; i < len; i++) {
>>>@@ -440,6 +446,47 @@ static int wait_for_ct_request_update(struct 
>>>ct_request *req, u32 *status)
>>>	return err;
>>>}
>>>+static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, 
>>>u32 len)
>>>+{
>>>+	u32 head = READ_ONCE(desc->head);
>>>+	u32 space;
>>>+
>>>+	space = CIRC_SPACE(desc->tail, head, desc->size);
>>>+
>>>+	return space >= len;
>>>+}
>>>+
>>>+int intel_guc_send_nb(struct intel_guc_ct *ct,
>>>+		      const u32 *action,
>>>+		      u32 len)
>>>+{
>>>+	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>>>+	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>>+	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>+	int err;
>>>+
>>>+	GEM_BUG_ON(!ctch->enabled);
>>>+	GEM_BUG_ON(!len);
>>>+	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>+	lockdep_assert_held(&ct->send_lock);
>>
>>hmm, does it mean that now it's caller responsibility to spinlock
>>on CT private lock ? That is not how other guc_send() functions work.
>>
>
>Yes, that how I would like this work as I feel like it gives more flexability to
>the caller on the -EBUSY case. The caller can call intel_guc_send_nb again while
>still holding the lock or it release lock the and use a different form of flow
>control. Perhaps locking / unlocking should be exposed via static inlines rather
>than the caller directly manipulating the lock?
>
>>>+
>>>+	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>+		ct->retry++;
>>>+		if (ct->retry >= MAX_RETRY)
>>>+			return -EDEADLK;
>>>+		else
>>>+			return -EBUSY;
>>>+	}
>>>+
>>>+	ct->retry = 0;
>>>+	err = ctb_write(ctb, action, len, 0, false, false);
>>>+	if (unlikely(err))
>>>+		return err;
>>>+
>>>+	intel_guc_notify(ct_to_guc(ct));
>>>+	return 0;
>>>+}
>>>+
>>>static int ctch_send(struct intel_guc_ct *ct,
>>>		     struct intel_guc_ct_channel *ctch,
>>>		     const u32 *action,
>>>@@ -460,17 +507,35 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>	GEM_BUG_ON(!response_buf && response_buf_size);
>>>+	/*
>>>+	 * We use a lazy spin wait loop here as we believe that if the CT
>>>+	 * buffers are sized correctly the flow control condition should be
>>>+	 * rare.
>>>+	 */
>>>+retry:
>>>+	spin_lock_irqsave(&ct->send_lock, flags);
>>>+	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>+		spin_unlock_irqrestore(&ct->send_lock, flags);
>>>+		ct->retry++;
>>>+		if (ct->retry >= MAX_RETRY)
>>>+			return -EDEADLK;
>>
>>I'm not sure what's better: have secret deadlock hard to reproduce,
>>or deadlocks easier to catch that helps improve to be deadlock-clean
>>
>
>This is covering the case where the has died and to avoid deadlock. Eventually
>we will have some GuC health check code that will trigger a full GPU reset if
>the GuC has died. We need a way for code spinning on the CTBs to exit.
>
>I've already tweaked this code locally a bit to use an atomic too with the idea
>being the GuC health check code can set this value to have all code spinning on
>CTBs immediately return --EDEADLK when the GuC has died.
>
>>>+		cpu_relax();
>>>+		goto retry;
>>>+	}
>>>+
>>>+	ct->retry = 0;
>>>	fence = ctch_get_next_fence(ctch);
>>>	request.fence = fence;
>>>	request.status = 0;
>>>	request.response_len = response_buf_size;
>>>	request.response_buf = response_buf;
>>>-	spin_lock_irqsave(&ct->lock, flags);
>>>+	spin_lock(&ct->request_lock);
>>>	list_add_tail(&request.link, &ct->pending_requests);
>>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>>+	spin_unlock(&ct->request_lock);
>>>-	err = ctb_write(ctb, action, len, fence, !!response_buf);
>>>+	err = ctb_write(ctb, action, len, fence, true, !!response_buf);
>>>+	spin_unlock_irqrestore(&ct->send_lock, flags);
>>>	if (unlikely(err))
>>>		goto unlink;
>>>@@ -501,9 +566,9 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>	}
>>>unlink:
>>>-	spin_lock_irqsave(&ct->lock, flags);
>>>+	spin_lock_irqsave(&ct->request_lock, flags);
>>>	list_del(&request.link);
>>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>>+	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>	return err;
>>>}
>>>@@ -653,7 +718,7 @@ static int ct_handle_response(struct 
>>>intel_guc_ct *ct, const u32 *msg)
>>>	CT_DEBUG_DRIVER("CT: response fence %u status %#x\n", fence, status);
>>>-	spin_lock(&ct->lock);
>>>+	spin_lock(&ct->request_lock);
>>>	list_for_each_entry(req, &ct->pending_requests, link) {
>>>		if (unlikely(fence != req->fence)) {
>>>			CT_DEBUG_DRIVER("CT: request %u awaits response\n",
>>>@@ -672,7 +737,7 @@ static int ct_handle_response(struct 
>>>intel_guc_ct *ct, const u32 *msg)
>>>		found = true;
>>>		break;
>>>	}
>>>-	spin_unlock(&ct->lock);
>>>+	spin_unlock(&ct->request_lock);
>>>	if (!found)
>>>		DRM_ERROR("CT: unsolicited response %*ph\n", 4 * msglen, msg);
>>>@@ -710,13 +775,13 @@ static bool 
>>>ct_process_incoming_requests(struct intel_guc_ct *ct)
>>>	u32 *payload;
>>>	bool done;
>>>-	spin_lock_irqsave(&ct->lock, flags);
>>>+	spin_lock_irqsave(&ct->request_lock, flags);
>>>	request = list_first_entry_or_null(&ct->incoming_requests,
>>>					   struct ct_incoming_request, link);
>>>	if (request)
>>>		list_del(&request->link);
>>>	done = !!list_empty(&ct->incoming_requests);
>>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>>+	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>	if (!request)
>>>		return true;
>>>@@ -777,9 +842,9 @@ static int ct_handle_request(struct 
>>>intel_guc_ct *ct, const u32 *msg)
>>>	}
>>>	memcpy(request->msg, msg, 4 * msglen);
>>>-	spin_lock_irqsave(&ct->lock, flags);
>>>+	spin_lock_irqsave(&ct->request_lock, flags);
>>>	list_add_tail(&request->link, &ct->incoming_requests);
>>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>>+	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>	queue_work(system_unbound_wq, &ct->worker);
>>>	return 0;
>>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h 
>>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>index 7c24d83f5c24..bc670a796bd8 100644
>>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>@@ -62,8 +62,11 @@ struct intel_guc_ct {
>>>	struct intel_guc_ct_channel host_channel;
>>>	/* other channels are tbd */
>>>-	/** @lock: protects pending requests list */
>>>-	spinlock_t lock;
>>>+	/** @request_lock: protects pending requests list */
>>>+	spinlock_t request_lock;
>>>+
>>>+	/** @send_lock: protects h2g channel */
>>>+	spinlock_t send_lock;
>>>	/** @pending_requests: list of requests waiting for response */
>>>	struct list_head pending_requests;
>>>@@ -73,6 +76,9 @@ struct intel_guc_ct {
>>>	/** @worker: worker for handling incoming requests */
>>>	struct work_struct worker;
>>>+
>>>+	/** @retry: the number of times a H2G CTB has been retried */
>>>+	u32 retry;
>>>};
>>>void intel_guc_ct_init_early(struct intel_guc_ct *ct);
>>_______________________________________________
>>Intel-gfx mailing list
>>Intel-gfx@lists.freedesktop.org
>>https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>_______________________________________________
>Intel-gfx mailing list
>Intel-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx