All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] drm/i915/guc: CTB improvements
@ 2019-11-20 23:56 ` John.C.Harrison
  0 siblings, 0 replies; 36+ messages in thread
From: John.C.Harrison @ 2019-11-20 23:56 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

These patches improve the CTB infrastructure - Command Transport
Buffer, the communication mechanism between i915 and GuC.

They are part of the (large) series for updating the i915 GuC
implementation to support the new GuC API. That series is still in
progress (but getting close). However, it was suggested that these
patches could be pushed early to help reduce the patch burden. They
are not directly related to the new GuC API and so are compatible with
the old GuC implementation.

The new GuC API makes much heavier use of the CTB. Indeed, it becomes
part of the command submission path. Hence, the need for optimisation,
larger buffers and support for sending without a mutex lock.

Matthew Brost (3):
  drm/i915/guc: Add non blocking CTB send function
  drm/i915/guc: Optimized CTB writes and reads
  drm/i915/guc: Increase size of CTB buffers

 drivers/gpu/drm/i915/gt/uc/intel_guc.h    |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 214 +++++++++++++++-------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  18 +-
 3 files changed, 162 insertions(+), 72 deletions(-)

-- 
2.21.0.5.gaeb582a983

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Intel-gfx] [PATCH 0/3] drm/i915/guc: CTB improvements
@ 2019-11-20 23:56 ` John.C.Harrison
  0 siblings, 0 replies; 36+ messages in thread
From: John.C.Harrison @ 2019-11-20 23:56 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

These patches improve the CTB infrastructure - Command Transport
Buffer, the communication mechanism between i915 and GuC.

They are part of the (large) series for updating the i915 GuC
implementation to support the new GuC API. That series is still in
progress (but getting close). However, it was suggested that these
patches could be pushed early to help reduce the patch burden. They
are not directly related to the new GuC API and so are compatible with
the old GuC implementation.

The new GuC API makes much heavier use of the CTB. Indeed, it becomes
part of the command submission path. Hence, the need for optimisation,
larger buffers and support for sending without a mutex lock.

Matthew Brost (3):
  drm/i915/guc: Add non blocking CTB send function
  drm/i915/guc: Optimized CTB writes and reads
  drm/i915/guc: Increase size of CTB buffers

 drivers/gpu/drm/i915/gt/uc/intel_guc.h    |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 214 +++++++++++++++-------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  18 +-
 3 files changed, 162 insertions(+), 72 deletions(-)

-- 
2.21.0.5.gaeb582a983

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/3] drm/i915/guc: Add non blocking CTB send function
@ 2019-11-20 23:56   ` John.C.Harrison
  0 siblings, 0 replies; 36+ messages in thread
From: John.C.Harrison @ 2019-11-20 23:56 UTC (permalink / raw)
  To: Intel-GFX

From: Matthew Brost <matthew.brost@intel.com>

Add non blocking CTB send fuction, intel_guc_send_nb. In order to
support a non blocking CTB send fuction a spin lock is needed to
protect the CTB descriptors fields. Also the non blocking call must not
update the fence value as this value is owned by the blocking call
(intel_guc_send).

The blocking CTB now must have a flow control mechanism to ensure the
buffer isn't overrun. A lazy spin wait is used as we believe the flow
control condition should be rare with properly sized buffer. A retry
counter is also implemented which fails H2G CTBs once a limit is
reached to prevent deadlock.

The function, intel_guc_send_nb, is exported in this patch but unused.
Several patches later in the series make use of this function.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h    |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 97 +++++++++++++++++++----
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 10 ++-
 3 files changed, 91 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index e6400204a2bd..77c5af919ace 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -94,6 +94,8 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
 	return guc->send(guc, action, len, response_buf, response_buf_size);
 }
 
+int intel_guc_send_nb(struct intel_guc_ct *ct, const u32 *action, u32 len);
+
 static inline void intel_guc_notify(struct intel_guc *guc)
 {
 	guc->notify(guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index b49115517510..e50d968b15d5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -3,6 +3,8 @@
  * Copyright © 2016-2019 Intel Corporation
  */
 
+#include <linux/circ_buf.h>
+
 #include "i915_drv.h"
 #include "intel_guc_ct.h"
 
@@ -12,6 +14,8 @@
 #define CT_DEBUG_DRIVER(...)	do { } while (0)
 #endif
 
+#define MAX_RETRY		0x1000000
+
 struct ct_request {
 	struct list_head link;
 	u32 fence;
@@ -40,7 +44,8 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
 	/* we're using static channel owners */
 	ct->host_channel.owner = CTB_OWNER_HOST;
 
-	spin_lock_init(&ct->lock);
+	spin_lock_init(&ct->request_lock);
+	spin_lock_init(&ct->send_lock);
 	INIT_LIST_HEAD(&ct->pending_requests);
 	INIT_LIST_HEAD(&ct->incoming_requests);
 	INIT_WORK(&ct->worker, ct_incoming_request_worker_func);
@@ -291,7 +296,8 @@ static u32 ctch_get_next_fence(struct intel_guc_ct_channel *ctch)
 static int ctb_write(struct intel_guc_ct_buffer *ctb,
 		     const u32 *action,
 		     u32 len /* in dwords */,
-		     u32 fence,
+		     u32 fence_value,
+		     bool enable_fence,
 		     bool want_response)
 {
 	struct guc_ct_buffer_desc *desc = ctb->desc;
@@ -328,18 +334,18 @@ static int ctb_write(struct intel_guc_ct_buffer *ctb,
 	 * DW2+: action data
 	 */
 	header = (len << GUC_CT_MSG_LEN_SHIFT) |
-		 (GUC_CT_MSG_WRITE_FENCE_TO_DESC) |
+		 (enable_fence ? GUC_CT_MSG_WRITE_FENCE_TO_DESC : 0) |
 		 (want_response ? GUC_CT_MSG_SEND_STATUS : 0) |
 		 (action[0] << GUC_CT_MSG_ACTION_SHIFT);
 
 	CT_DEBUG_DRIVER("CT: writing %*ph %*ph %*ph\n",
-			4, &header, 4, &fence,
+			4, &header, 4, &fence_value,
 			4 * (len - 1), &action[1]);
 
 	cmds[tail] = header;
 	tail = (tail + 1) % size;
 
-	cmds[tail] = fence;
+	cmds[tail] = fence_value;
 	tail = (tail + 1) % size;
 
 	for (i = 1; i < len; i++) {
@@ -440,6 +446,47 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
 	return err;
 }
 
+static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, u32 len)
+{
+	u32 head = READ_ONCE(desc->head);
+	u32 space;
+
+	space = CIRC_SPACE(desc->tail, head, desc->size);
+
+	return space >= len;
+}
+
+int intel_guc_send_nb(struct intel_guc_ct *ct,
+		      const u32 *action,
+		      u32 len)
+{
+	struct intel_guc_ct_channel *ctch = &ct->host_channel;
+	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
+	struct guc_ct_buffer_desc *desc = ctb->desc;
+	int err;
+
+	GEM_BUG_ON(!ctch->enabled);
+	GEM_BUG_ON(!len);
+	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
+	lockdep_assert_held(&ct->send_lock);
+
+	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
+		ct->retry++;
+		if (ct->retry >= MAX_RETRY)
+			return -EDEADLK;
+		else
+			return -EBUSY;
+	}
+
+	ct->retry = 0;
+	err = ctb_write(ctb, action, len, 0, false, false);
+	if (unlikely(err))
+		return err;
+
+	intel_guc_notify(ct_to_guc(ct));
+	return 0;
+}
+
 static int ctch_send(struct intel_guc_ct *ct,
 		     struct intel_guc_ct_channel *ctch,
 		     const u32 *action,
@@ -460,17 +507,35 @@ static int ctch_send(struct intel_guc_ct *ct,
 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
 	GEM_BUG_ON(!response_buf && response_buf_size);
 
+	/*
+	 * We use a lazy spin wait loop here as we believe that if the CT
+	 * buffers are sized correctly the flow control condition should be
+	 * rare.
+	 */
+retry:
+	spin_lock_irqsave(&ct->send_lock, flags);
+	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
+		spin_unlock_irqrestore(&ct->send_lock, flags);
+		ct->retry++;
+		if (ct->retry >= MAX_RETRY)
+			return -EDEADLK;
+		cpu_relax();
+		goto retry;
+	}
+
+	ct->retry = 0;
 	fence = ctch_get_next_fence(ctch);
 	request.fence = fence;
 	request.status = 0;
 	request.response_len = response_buf_size;
 	request.response_buf = response_buf;
 
-	spin_lock_irqsave(&ct->lock, flags);
+	spin_lock(&ct->request_lock);
 	list_add_tail(&request.link, &ct->pending_requests);
-	spin_unlock_irqrestore(&ct->lock, flags);
+	spin_unlock(&ct->request_lock);
 
-	err = ctb_write(ctb, action, len, fence, !!response_buf);
+	err = ctb_write(ctb, action, len, fence, true, !!response_buf);
+	spin_unlock_irqrestore(&ct->send_lock, flags);
 	if (unlikely(err))
 		goto unlink;
 
@@ -501,9 +566,9 @@ static int ctch_send(struct intel_guc_ct *ct,
 	}
 
 unlink:
-	spin_lock_irqsave(&ct->lock, flags);
+	spin_lock_irqsave(&ct->request_lock, flags);
 	list_del(&request.link);
-	spin_unlock_irqrestore(&ct->lock, flags);
+	spin_unlock_irqrestore(&ct->request_lock, flags);
 
 	return err;
 }
@@ -653,7 +718,7 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
 
 	CT_DEBUG_DRIVER("CT: response fence %u status %#x\n", fence, status);
 
-	spin_lock(&ct->lock);
+	spin_lock(&ct->request_lock);
 	list_for_each_entry(req, &ct->pending_requests, link) {
 		if (unlikely(fence != req->fence)) {
 			CT_DEBUG_DRIVER("CT: request %u awaits response\n",
@@ -672,7 +737,7 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
 		found = true;
 		break;
 	}
-	spin_unlock(&ct->lock);
+	spin_unlock(&ct->request_lock);
 
 	if (!found)
 		DRM_ERROR("CT: unsolicited response %*ph\n", 4 * msglen, msg);
@@ -710,13 +775,13 @@ static bool ct_process_incoming_requests(struct intel_guc_ct *ct)
 	u32 *payload;
 	bool done;
 
-	spin_lock_irqsave(&ct->lock, flags);
+	spin_lock_irqsave(&ct->request_lock, flags);
 	request = list_first_entry_or_null(&ct->incoming_requests,
 					   struct ct_incoming_request, link);
 	if (request)
 		list_del(&request->link);
 	done = !!list_empty(&ct->incoming_requests);
-	spin_unlock_irqrestore(&ct->lock, flags);
+	spin_unlock_irqrestore(&ct->request_lock, flags);
 
 	if (!request)
 		return true;
@@ -777,9 +842,9 @@ static int ct_handle_request(struct intel_guc_ct *ct, const u32 *msg)
 	}
 	memcpy(request->msg, msg, 4 * msglen);
 
-	spin_lock_irqsave(&ct->lock, flags);
+	spin_lock_irqsave(&ct->request_lock, flags);
 	list_add_tail(&request->link, &ct->incoming_requests);
-	spin_unlock_irqrestore(&ct->lock, flags);
+	spin_unlock_irqrestore(&ct->request_lock, flags);
 
 	queue_work(system_unbound_wq, &ct->worker);
 	return 0;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 7c24d83f5c24..bc670a796bd8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -62,8 +62,11 @@ struct intel_guc_ct {
 	struct intel_guc_ct_channel host_channel;
 	/* other channels are tbd */
 
-	/** @lock: protects pending requests list */
-	spinlock_t lock;
+	/** @request_lock: protects pending requests list */
+	spinlock_t request_lock;
+
+	/** @send_lock: protects h2g channel */
+	spinlock_t send_lock;
 
 	/** @pending_requests: list of requests waiting for response */
 	struct list_head pending_requests;
@@ -73,6 +76,9 @@ struct intel_guc_ct {
 
 	/** @worker: worker for handling incoming requests */
 	struct work_struct worker;
+
+	/** @retry: the number of times a H2G CTB has been retried */
+	u32 retry;
 };
 
 void intel_guc_ct_init_early(struct intel_guc_ct *ct);
-- 
2.21.0.5.gaeb582a983

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Intel-gfx] [PATCH 1/3] drm/i915/guc: Add non blocking CTB send function
@ 2019-11-20 23:56   ` John.C.Harrison
  0 siblings, 0 replies; 36+ messages in thread
From: John.C.Harrison @ 2019-11-20 23:56 UTC (permalink / raw)
  To: Intel-GFX

From: Matthew Brost <matthew.brost@intel.com>

Add non blocking CTB send fuction, intel_guc_send_nb. In order to
support a non blocking CTB send fuction a spin lock is needed to
protect the CTB descriptors fields. Also the non blocking call must not
update the fence value as this value is owned by the blocking call
(intel_guc_send).

The blocking CTB now must have a flow control mechanism to ensure the
buffer isn't overrun. A lazy spin wait is used as we believe the flow
control condition should be rare with properly sized buffer. A retry
counter is also implemented which fails H2G CTBs once a limit is
reached to prevent deadlock.

The function, intel_guc_send_nb, is exported in this patch but unused.
Several patches later in the series make use of this function.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h    |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 97 +++++++++++++++++++----
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 10 ++-
 3 files changed, 91 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index e6400204a2bd..77c5af919ace 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -94,6 +94,8 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
 	return guc->send(guc, action, len, response_buf, response_buf_size);
 }
 
+int intel_guc_send_nb(struct intel_guc_ct *ct, const u32 *action, u32 len);
+
 static inline void intel_guc_notify(struct intel_guc *guc)
 {
 	guc->notify(guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index b49115517510..e50d968b15d5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -3,6 +3,8 @@
  * Copyright © 2016-2019 Intel Corporation
  */
 
+#include <linux/circ_buf.h>
+
 #include "i915_drv.h"
 #include "intel_guc_ct.h"
 
@@ -12,6 +14,8 @@
 #define CT_DEBUG_DRIVER(...)	do { } while (0)
 #endif
 
+#define MAX_RETRY		0x1000000
+
 struct ct_request {
 	struct list_head link;
 	u32 fence;
@@ -40,7 +44,8 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
 	/* we're using static channel owners */
 	ct->host_channel.owner = CTB_OWNER_HOST;
 
-	spin_lock_init(&ct->lock);
+	spin_lock_init(&ct->request_lock);
+	spin_lock_init(&ct->send_lock);
 	INIT_LIST_HEAD(&ct->pending_requests);
 	INIT_LIST_HEAD(&ct->incoming_requests);
 	INIT_WORK(&ct->worker, ct_incoming_request_worker_func);
@@ -291,7 +296,8 @@ static u32 ctch_get_next_fence(struct intel_guc_ct_channel *ctch)
 static int ctb_write(struct intel_guc_ct_buffer *ctb,
 		     const u32 *action,
 		     u32 len /* in dwords */,
-		     u32 fence,
+		     u32 fence_value,
+		     bool enable_fence,
 		     bool want_response)
 {
 	struct guc_ct_buffer_desc *desc = ctb->desc;
@@ -328,18 +334,18 @@ static int ctb_write(struct intel_guc_ct_buffer *ctb,
 	 * DW2+: action data
 	 */
 	header = (len << GUC_CT_MSG_LEN_SHIFT) |
-		 (GUC_CT_MSG_WRITE_FENCE_TO_DESC) |
+		 (enable_fence ? GUC_CT_MSG_WRITE_FENCE_TO_DESC : 0) |
 		 (want_response ? GUC_CT_MSG_SEND_STATUS : 0) |
 		 (action[0] << GUC_CT_MSG_ACTION_SHIFT);
 
 	CT_DEBUG_DRIVER("CT: writing %*ph %*ph %*ph\n",
-			4, &header, 4, &fence,
+			4, &header, 4, &fence_value,
 			4 * (len - 1), &action[1]);
 
 	cmds[tail] = header;
 	tail = (tail + 1) % size;
 
-	cmds[tail] = fence;
+	cmds[tail] = fence_value;
 	tail = (tail + 1) % size;
 
 	for (i = 1; i < len; i++) {
@@ -440,6 +446,47 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
 	return err;
 }
 
+static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, u32 len)
+{
+	u32 head = READ_ONCE(desc->head);
+	u32 space;
+
+	space = CIRC_SPACE(desc->tail, head, desc->size);
+
+	return space >= len;
+}
+
+int intel_guc_send_nb(struct intel_guc_ct *ct,
+		      const u32 *action,
+		      u32 len)
+{
+	struct intel_guc_ct_channel *ctch = &ct->host_channel;
+	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
+	struct guc_ct_buffer_desc *desc = ctb->desc;
+	int err;
+
+	GEM_BUG_ON(!ctch->enabled);
+	GEM_BUG_ON(!len);
+	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
+	lockdep_assert_held(&ct->send_lock);
+
+	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
+		ct->retry++;
+		if (ct->retry >= MAX_RETRY)
+			return -EDEADLK;
+		else
+			return -EBUSY;
+	}
+
+	ct->retry = 0;
+	err = ctb_write(ctb, action, len, 0, false, false);
+	if (unlikely(err))
+		return err;
+
+	intel_guc_notify(ct_to_guc(ct));
+	return 0;
+}
+
 static int ctch_send(struct intel_guc_ct *ct,
 		     struct intel_guc_ct_channel *ctch,
 		     const u32 *action,
@@ -460,17 +507,35 @@ static int ctch_send(struct intel_guc_ct *ct,
 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
 	GEM_BUG_ON(!response_buf && response_buf_size);
 
+	/*
+	 * We use a lazy spin wait loop here as we believe that if the CT
+	 * buffers are sized correctly the flow control condition should be
+	 * rare.
+	 */
+retry:
+	spin_lock_irqsave(&ct->send_lock, flags);
+	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
+		spin_unlock_irqrestore(&ct->send_lock, flags);
+		ct->retry++;
+		if (ct->retry >= MAX_RETRY)
+			return -EDEADLK;
+		cpu_relax();
+		goto retry;
+	}
+
+	ct->retry = 0;
 	fence = ctch_get_next_fence(ctch);
 	request.fence = fence;
 	request.status = 0;
 	request.response_len = response_buf_size;
 	request.response_buf = response_buf;
 
-	spin_lock_irqsave(&ct->lock, flags);
+	spin_lock(&ct->request_lock);
 	list_add_tail(&request.link, &ct->pending_requests);
-	spin_unlock_irqrestore(&ct->lock, flags);
+	spin_unlock(&ct->request_lock);
 
-	err = ctb_write(ctb, action, len, fence, !!response_buf);
+	err = ctb_write(ctb, action, len, fence, true, !!response_buf);
+	spin_unlock_irqrestore(&ct->send_lock, flags);
 	if (unlikely(err))
 		goto unlink;
 
@@ -501,9 +566,9 @@ static int ctch_send(struct intel_guc_ct *ct,
 	}
 
 unlink:
-	spin_lock_irqsave(&ct->lock, flags);
+	spin_lock_irqsave(&ct->request_lock, flags);
 	list_del(&request.link);
-	spin_unlock_irqrestore(&ct->lock, flags);
+	spin_unlock_irqrestore(&ct->request_lock, flags);
 
 	return err;
 }
@@ -653,7 +718,7 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
 
 	CT_DEBUG_DRIVER("CT: response fence %u status %#x\n", fence, status);
 
-	spin_lock(&ct->lock);
+	spin_lock(&ct->request_lock);
 	list_for_each_entry(req, &ct->pending_requests, link) {
 		if (unlikely(fence != req->fence)) {
 			CT_DEBUG_DRIVER("CT: request %u awaits response\n",
@@ -672,7 +737,7 @@ static int ct_handle_response(struct intel_guc_ct *ct, const u32 *msg)
 		found = true;
 		break;
 	}
-	spin_unlock(&ct->lock);
+	spin_unlock(&ct->request_lock);
 
 	if (!found)
 		DRM_ERROR("CT: unsolicited response %*ph\n", 4 * msglen, msg);
@@ -710,13 +775,13 @@ static bool ct_process_incoming_requests(struct intel_guc_ct *ct)
 	u32 *payload;
 	bool done;
 
-	spin_lock_irqsave(&ct->lock, flags);
+	spin_lock_irqsave(&ct->request_lock, flags);
 	request = list_first_entry_or_null(&ct->incoming_requests,
 					   struct ct_incoming_request, link);
 	if (request)
 		list_del(&request->link);
 	done = !!list_empty(&ct->incoming_requests);
-	spin_unlock_irqrestore(&ct->lock, flags);
+	spin_unlock_irqrestore(&ct->request_lock, flags);
 
 	if (!request)
 		return true;
@@ -777,9 +842,9 @@ static int ct_handle_request(struct intel_guc_ct *ct, const u32 *msg)
 	}
 	memcpy(request->msg, msg, 4 * msglen);
 
-	spin_lock_irqsave(&ct->lock, flags);
+	spin_lock_irqsave(&ct->request_lock, flags);
 	list_add_tail(&request->link, &ct->incoming_requests);
-	spin_unlock_irqrestore(&ct->lock, flags);
+	spin_unlock_irqrestore(&ct->request_lock, flags);
 
 	queue_work(system_unbound_wq, &ct->worker);
 	return 0;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 7c24d83f5c24..bc670a796bd8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -62,8 +62,11 @@ struct intel_guc_ct {
 	struct intel_guc_ct_channel host_channel;
 	/* other channels are tbd */
 
-	/** @lock: protects pending requests list */
-	spinlock_t lock;
+	/** @request_lock: protects pending requests list */
+	spinlock_t request_lock;
+
+	/** @send_lock: protects h2g channel */
+	spinlock_t send_lock;
 
 	/** @pending_requests: list of requests waiting for response */
 	struct list_head pending_requests;
@@ -73,6 +76,9 @@ struct intel_guc_ct {
 
 	/** @worker: worker for handling incoming requests */
 	struct work_struct worker;
+
+	/** @retry: the number of times a H2G CTB has been retried */
+	u32 retry;
 };
 
 void intel_guc_ct_init_early(struct intel_guc_ct *ct);
-- 
2.21.0.5.gaeb582a983

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/3] drm/i915/guc: Optimized CTB writes and reads
@ 2019-11-20 23:56   ` John.C.Harrison
  0 siblings, 0 replies; 36+ messages in thread
From: John.C.Harrison @ 2019-11-20 23:56 UTC (permalink / raw)
  To: Intel-GFX

From: Matthew Brost <matthew.brost@intel.com>

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail, size) which could result in accesses across the PCIe
bus, store shadow local copies and only read/write the descriptor
values when absolutely necessary.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 79 +++++++++++------------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  8 +++
 2 files changed, 45 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index e50d968b15d5..4d8a4c6afd71 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -68,23 +68,29 @@ static inline const char *guc_ct_buffer_type_to_str(u32 type)
 	}
 }
 
-static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
+static void guc_ct_buffer_desc_init(struct intel_guc_ct_buffer *ctb,
 				    u32 cmds_addr, u32 size, u32 owner)
 {
+	struct guc_ct_buffer_desc *desc = ctb->desc;
 	CT_DEBUG_DRIVER("CT: desc %p init addr=%#x size=%u owner=%u\n",
 			desc, cmds_addr, size, owner);
 	memset(desc, 0, sizeof(*desc));
 	desc->addr = cmds_addr;
-	desc->size = size;
+	ctb->size = desc->size = size;
 	desc->owner = owner;
+	ctb->tail = 0;
+	ctb->head = 0;
+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
 }
 
-static void guc_ct_buffer_desc_reset(struct guc_ct_buffer_desc *desc)
+static void guc_ct_buffer_desc_reset(struct intel_guc_ct_buffer *ctb)
 {
+	struct guc_ct_buffer_desc *desc = ctb->desc;
 	CT_DEBUG_DRIVER("CT: desc %p reset head=%u tail=%u\n",
 			desc, desc->head, desc->tail);
-	desc->head = 0;
-	desc->tail = 0;
+	ctb->head = desc->head = 0;
+	ctb->tail = desc->tail = 0;
+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
 	desc->is_in_error = 0;
 }
 
@@ -220,7 +226,7 @@ static int ctch_enable(struct intel_guc *guc,
 	 */
 	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
 		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
-		guc_ct_buffer_desc_init(ctch->ctbs[i].desc,
+		guc_ct_buffer_desc_init(&ctch->ctbs[i],
 					base + PAGE_SIZE/4 * i + PAGE_SIZE/2,
 					PAGE_SIZE/4,
 					ctch->owner);
@@ -301,32 +307,16 @@ static int ctb_write(struct intel_guc_ct_buffer *ctb,
 		     bool want_response)
 {
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head / 4;	/* in dwords */
-	u32 tail = desc->tail / 4;	/* in dwords */
-	u32 size = desc->size / 4;	/* in dwords */
-	u32 used;			/* in dwords */
+	u32 tail = ctb->tail / 4;	/* in dwords */
+	u32 size = ctb->size / 4;	/* in dwords */
 	u32 header;
 	u32 *cmds = ctb->cmds;
 	unsigned int i;
 
-	GEM_BUG_ON(desc->size % 4);
-	GEM_BUG_ON(desc->head % 4);
-	GEM_BUG_ON(desc->tail % 4);
+	GEM_BUG_ON(ctb->size % 4);
+	GEM_BUG_ON(ctb->tail % 4);
 	GEM_BUG_ON(tail >= size);
 
-	/*
-	 * tail == head condition indicates empty. GuC FW does not support
-	 * using up the entire buffer to get tail == head meaning full.
-	 */
-	if (tail < head)
-		used = (size - head) + tail;
-	else
-		used = tail - head;
-
-	/* make sure there is a space including extra dw for the fence */
-	if (unlikely(used + len + 1 >= size))
-		return -ENOSPC;
-
 	/*
 	 * Write the message. The format is the following:
 	 * DW0: header (including action code)
@@ -354,15 +344,16 @@ static int ctb_write(struct intel_guc_ct_buffer *ctb,
 	}
 
 	/* now update desc tail (back in bytes) */
-	desc->tail = tail * 4;
-	GEM_BUG_ON(desc->tail > desc->size);
+	ctb->tail = desc->tail = tail * 4;
+	ctb->space -= (len + 1) * 4;
+	GEM_BUG_ON(ctb->tail > ctb->size);
 
 	return 0;
 }
 
 /**
  * wait_for_ctb_desc_update - Wait for the CT buffer descriptor update.
- * @desc:	buffer descriptor
+ * @ctb:	ctb buffer
  * @fence:	response fence
  * @status:	placeholder for status
  *
@@ -376,11 +367,12 @@ static int ctb_write(struct intel_guc_ct_buffer *ctb,
  * *	-ETIMEDOUT no response within hardcoded timeout
  * *	-EPROTO no response, CT buffer is in error
  */
-static int wait_for_ctb_desc_update(struct guc_ct_buffer_desc *desc,
+static int wait_for_ctb_desc_update(struct intel_guc_ct_buffer *ctb,
 				    u32 fence,
 				    u32 *status)
 {
 	int err;
+	struct guc_ct_buffer_desc *desc = ctb->desc;
 
 	/*
 	 * Fast commands should complete in less than 10us, so sample quickly
@@ -401,7 +393,7 @@ static int wait_for_ctb_desc_update(struct guc_ct_buffer_desc *desc,
 			/* Something went wrong with the messaging, try to reset
 			 * the buffer and hope for the best
 			 */
-			guc_ct_buffer_desc_reset(desc);
+			guc_ct_buffer_desc_reset(ctb);
 			err = -EPROTO;
 		}
 	}
@@ -446,12 +438,17 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
 	return err;
 }
 
-static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, u32 len)
+static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len)
 {
-	u32 head = READ_ONCE(desc->head);
+	u32 head;
 	u32 space;
 
-	space = CIRC_SPACE(desc->tail, head, desc->size);
+	if (ctb->space >= len)
+		return true;
+
+	head = READ_ONCE(ctb->desc->head);
+	space = CIRC_SPACE(ctb->tail, head, ctb->size);
+	ctb->space = space;
 
 	return space >= len;
 }
@@ -462,7 +459,6 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
 {
 	struct intel_guc_ct_channel *ctch = &ct->host_channel;
 	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
-	struct guc_ct_buffer_desc *desc = ctb->desc;
 	int err;
 
 	GEM_BUG_ON(!ctch->enabled);
@@ -470,7 +466,7 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
 	lockdep_assert_held(&ct->send_lock);
 
-	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
+	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
 		ct->retry++;
 		if (ct->retry >= MAX_RETRY)
 			return -EDEADLK;
@@ -496,7 +492,6 @@ static int ctch_send(struct intel_guc_ct *ct,
 		     u32 *status)
 {
 	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
-	struct guc_ct_buffer_desc *desc = ctb->desc;
 	struct ct_request request;
 	unsigned long flags;
 	u32 fence;
@@ -514,7 +509,7 @@ static int ctch_send(struct intel_guc_ct *ct,
 	 */
 retry:
 	spin_lock_irqsave(&ct->send_lock, flags);
-	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
+	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
 		spin_unlock_irqrestore(&ct->send_lock, flags);
 		ct->retry++;
 		if (ct->retry >= MAX_RETRY)
@@ -544,7 +539,7 @@ static int ctch_send(struct intel_guc_ct *ct,
 	if (response_buf)
 		err = wait_for_ct_request_update(&request, status);
 	else
-		err = wait_for_ctb_desc_update(desc, fence, status);
+		err = wait_for_ctb_desc_update(ctb, fence, status);
 	if (unlikely(err))
 		goto unlink;
 
@@ -618,9 +613,9 @@ static inline bool ct_header_is_response(u32 header)
 static int ctb_read(struct intel_guc_ct_buffer *ctb, u32 *data)
 {
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head / 4;	/* in dwords */
+	u32 head = ctb->head / 4;	/* in dwords */
 	u32 tail = desc->tail / 4;	/* in dwords */
-	u32 size = desc->size / 4;	/* in dwords */
+	u32 size = ctb->size / 4;	/* in dwords */
 	u32 *cmds = ctb->cmds;
 	s32 available;			/* in dwords */
 	unsigned int len;
@@ -664,7 +659,7 @@ static int ctb_read(struct intel_guc_ct_buffer *ctb, u32 *data)
 	}
 	CT_DEBUG_DRIVER("CT: received %*ph\n", 4 * len, data);
 
-	desc->head = head * 4;
+	ctb->head = desc->head = head * 4;
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index bc670a796bd8..1bff4f0b91f7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -29,10 +29,18 @@ struct intel_guc;
  *
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
+ * @size: local shadow copy of size
+ * @head: local shadow copy of head
+ * @tail: local shadow copy of tail
+ * @space: local shadow copy of space
  */
 struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
+	u32 size;
+	u32 tail;
+	u32 head;
+	u32 space;
 };
 
 /** Represents pair of command transport buffers.
-- 
2.21.0.5.gaeb582a983

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Intel-gfx] [PATCH 2/3] drm/i915/guc: Optimized CTB writes and reads
@ 2019-11-20 23:56   ` John.C.Harrison
  0 siblings, 0 replies; 36+ messages in thread
From: John.C.Harrison @ 2019-11-20 23:56 UTC (permalink / raw)
  To: Intel-GFX

From: Matthew Brost <matthew.brost@intel.com>

CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail, size) which could result in accesses across the PCIe
bus, store shadow local copies and only read/write the descriptor
values when absolutely necessary.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 79 +++++++++++------------
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  8 +++
 2 files changed, 45 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index e50d968b15d5..4d8a4c6afd71 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -68,23 +68,29 @@ static inline const char *guc_ct_buffer_type_to_str(u32 type)
 	}
 }
 
-static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
+static void guc_ct_buffer_desc_init(struct intel_guc_ct_buffer *ctb,
 				    u32 cmds_addr, u32 size, u32 owner)
 {
+	struct guc_ct_buffer_desc *desc = ctb->desc;
 	CT_DEBUG_DRIVER("CT: desc %p init addr=%#x size=%u owner=%u\n",
 			desc, cmds_addr, size, owner);
 	memset(desc, 0, sizeof(*desc));
 	desc->addr = cmds_addr;
-	desc->size = size;
+	ctb->size = desc->size = size;
 	desc->owner = owner;
+	ctb->tail = 0;
+	ctb->head = 0;
+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
 }
 
-static void guc_ct_buffer_desc_reset(struct guc_ct_buffer_desc *desc)
+static void guc_ct_buffer_desc_reset(struct intel_guc_ct_buffer *ctb)
 {
+	struct guc_ct_buffer_desc *desc = ctb->desc;
 	CT_DEBUG_DRIVER("CT: desc %p reset head=%u tail=%u\n",
 			desc, desc->head, desc->tail);
-	desc->head = 0;
-	desc->tail = 0;
+	ctb->head = desc->head = 0;
+	ctb->tail = desc->tail = 0;
+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
 	desc->is_in_error = 0;
 }
 
@@ -220,7 +226,7 @@ static int ctch_enable(struct intel_guc *guc,
 	 */
 	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
 		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
-		guc_ct_buffer_desc_init(ctch->ctbs[i].desc,
+		guc_ct_buffer_desc_init(&ctch->ctbs[i],
 					base + PAGE_SIZE/4 * i + PAGE_SIZE/2,
 					PAGE_SIZE/4,
 					ctch->owner);
@@ -301,32 +307,16 @@ static int ctb_write(struct intel_guc_ct_buffer *ctb,
 		     bool want_response)
 {
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head / 4;	/* in dwords */
-	u32 tail = desc->tail / 4;	/* in dwords */
-	u32 size = desc->size / 4;	/* in dwords */
-	u32 used;			/* in dwords */
+	u32 tail = ctb->tail / 4;	/* in dwords */
+	u32 size = ctb->size / 4;	/* in dwords */
 	u32 header;
 	u32 *cmds = ctb->cmds;
 	unsigned int i;
 
-	GEM_BUG_ON(desc->size % 4);
-	GEM_BUG_ON(desc->head % 4);
-	GEM_BUG_ON(desc->tail % 4);
+	GEM_BUG_ON(ctb->size % 4);
+	GEM_BUG_ON(ctb->tail % 4);
 	GEM_BUG_ON(tail >= size);
 
-	/*
-	 * tail == head condition indicates empty. GuC FW does not support
-	 * using up the entire buffer to get tail == head meaning full.
-	 */
-	if (tail < head)
-		used = (size - head) + tail;
-	else
-		used = tail - head;
-
-	/* make sure there is a space including extra dw for the fence */
-	if (unlikely(used + len + 1 >= size))
-		return -ENOSPC;
-
 	/*
 	 * Write the message. The format is the following:
 	 * DW0: header (including action code)
@@ -354,15 +344,16 @@ static int ctb_write(struct intel_guc_ct_buffer *ctb,
 	}
 
 	/* now update desc tail (back in bytes) */
-	desc->tail = tail * 4;
-	GEM_BUG_ON(desc->tail > desc->size);
+	ctb->tail = desc->tail = tail * 4;
+	ctb->space -= (len + 1) * 4;
+	GEM_BUG_ON(ctb->tail > ctb->size);
 
 	return 0;
 }
 
 /**
  * wait_for_ctb_desc_update - Wait for the CT buffer descriptor update.
- * @desc:	buffer descriptor
+ * @ctb:	ctb buffer
  * @fence:	response fence
  * @status:	placeholder for status
  *
@@ -376,11 +367,12 @@ static int ctb_write(struct intel_guc_ct_buffer *ctb,
  * *	-ETIMEDOUT no response within hardcoded timeout
  * *	-EPROTO no response, CT buffer is in error
  */
-static int wait_for_ctb_desc_update(struct guc_ct_buffer_desc *desc,
+static int wait_for_ctb_desc_update(struct intel_guc_ct_buffer *ctb,
 				    u32 fence,
 				    u32 *status)
 {
 	int err;
+	struct guc_ct_buffer_desc *desc = ctb->desc;
 
 	/*
 	 * Fast commands should complete in less than 10us, so sample quickly
@@ -401,7 +393,7 @@ static int wait_for_ctb_desc_update(struct guc_ct_buffer_desc *desc,
 			/* Something went wrong with the messaging, try to reset
 			 * the buffer and hope for the best
 			 */
-			guc_ct_buffer_desc_reset(desc);
+			guc_ct_buffer_desc_reset(ctb);
 			err = -EPROTO;
 		}
 	}
@@ -446,12 +438,17 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status)
 	return err;
 }
 
-static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, u32 len)
+static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32 len)
 {
-	u32 head = READ_ONCE(desc->head);
+	u32 head;
 	u32 space;
 
-	space = CIRC_SPACE(desc->tail, head, desc->size);
+	if (ctb->space >= len)
+		return true;
+
+	head = READ_ONCE(ctb->desc->head);
+	space = CIRC_SPACE(ctb->tail, head, ctb->size);
+	ctb->space = space;
 
 	return space >= len;
 }
@@ -462,7 +459,6 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
 {
 	struct intel_guc_ct_channel *ctch = &ct->host_channel;
 	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
-	struct guc_ct_buffer_desc *desc = ctb->desc;
 	int err;
 
 	GEM_BUG_ON(!ctch->enabled);
@@ -470,7 +466,7 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
 	lockdep_assert_held(&ct->send_lock);
 
-	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
+	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
 		ct->retry++;
 		if (ct->retry >= MAX_RETRY)
 			return -EDEADLK;
@@ -496,7 +492,6 @@ static int ctch_send(struct intel_guc_ct *ct,
 		     u32 *status)
 {
 	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
-	struct guc_ct_buffer_desc *desc = ctb->desc;
 	struct ct_request request;
 	unsigned long flags;
 	u32 fence;
@@ -514,7 +509,7 @@ static int ctch_send(struct intel_guc_ct *ct,
 	 */
 retry:
 	spin_lock_irqsave(&ct->send_lock, flags);
-	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
+	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
 		spin_unlock_irqrestore(&ct->send_lock, flags);
 		ct->retry++;
 		if (ct->retry >= MAX_RETRY)
@@ -544,7 +539,7 @@ static int ctch_send(struct intel_guc_ct *ct,
 	if (response_buf)
 		err = wait_for_ct_request_update(&request, status);
 	else
-		err = wait_for_ctb_desc_update(desc, fence, status);
+		err = wait_for_ctb_desc_update(ctb, fence, status);
 	if (unlikely(err))
 		goto unlink;
 
@@ -618,9 +613,9 @@ static inline bool ct_header_is_response(u32 header)
 static int ctb_read(struct intel_guc_ct_buffer *ctb, u32 *data)
 {
 	struct guc_ct_buffer_desc *desc = ctb->desc;
-	u32 head = desc->head / 4;	/* in dwords */
+	u32 head = ctb->head / 4;	/* in dwords */
 	u32 tail = desc->tail / 4;	/* in dwords */
-	u32 size = desc->size / 4;	/* in dwords */
+	u32 size = ctb->size / 4;	/* in dwords */
 	u32 *cmds = ctb->cmds;
 	s32 available;			/* in dwords */
 	unsigned int len;
@@ -664,7 +659,7 @@ static int ctb_read(struct intel_guc_ct_buffer *ctb, u32 *data)
 	}
 	CT_DEBUG_DRIVER("CT: received %*ph\n", 4 * len, data);
 
-	desc->head = head * 4;
+	ctb->head = desc->head = head * 4;
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index bc670a796bd8..1bff4f0b91f7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -29,10 +29,18 @@ struct intel_guc;
  *
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
+ * @size: local shadow copy of size
+ * @head: local shadow copy of head
+ * @tail: local shadow copy of tail
+ * @space: local shadow copy of space
  */
 struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
+	u32 size;
+	u32 tail;
+	u32 head;
+	u32 space;
 };
 
 /** Represents pair of command transport buffers.
-- 
2.21.0.5.gaeb582a983

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/3] drm/i915/guc: Increase size of CTB buffers
@ 2019-11-20 23:56   ` John.C.Harrison
  0 siblings, 0 replies; 36+ messages in thread
From: John.C.Harrison @ 2019-11-20 23:56 UTC (permalink / raw)
  To: Intel-GFX

From: Matthew Brost <matthew.brost@intel.com>

With the introduction of non-blocking CTBs more than one CTB can be in
flight at a time. Increasing the size of the CTBs should reduce how
often software hits the case where no space is available in the CTB
buffer.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 50 +++++++++++++++--------
 1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 4d8a4c6afd71..31c512e7ecc2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -14,6 +14,10 @@
 #define CT_DEBUG_DRIVER(...)	do { } while (0)
 #endif
 
+#define CTB_DESC_SIZE		(PAGE_SIZE / 2)
+#define CTB_H2G_BUFFER_SIZE	(PAGE_SIZE)
+#define CTB_G2H_BUFFER_SIZE	(CTB_H2G_BUFFER_SIZE * 2)
+
 #define MAX_RETRY		0x1000000
 
 struct ct_request {
@@ -143,30 +147,35 @@ static int ctch_init(struct intel_guc *guc,
 
 	GEM_BUG_ON(ctch->vma);
 
-	/* We allocate 1 page to hold both descriptors and both buffers.
+	/* We allocate 3 pages to hold both descriptors and both buffers.
 	 *       ___________.....................
 	 *      |desc (SEND)|                   :
-	 *      |___________|                   PAGE/4
+	 *      |___________|                   PAGE/2
 	 *      :___________....................:
 	 *      |desc (RECV)|                   :
-	 *      |___________|                   PAGE/4
+	 *      |___________|                   PAGE/2
 	 *      :_______________________________:
 	 *      |cmds (SEND)                    |
-	 *      |                               PAGE/4
+	 *      |                               PAGE
 	 *      |_______________________________|
 	 *      |cmds (RECV)                    |
-	 *      |                               PAGE/4
+	 *      |                               PAGE * 2
 	 *      |_______________________________|
 	 *
 	 * Each message can use a maximum of 32 dwords and we don't expect to
-	 * have more than 1 in flight at any time, so we have enough space.
-	 * Some logic further ahead will rely on the fact that there is only 1
-	 * page and that it is always mapped, so if the size is changed the
-	 * other code will need updating as well.
+	 * have more than 1 in flight at any time, unless we are using the GuC
+	 * submission. In that case each request requires a minimum 8 bytes
+	 * which gives us a maximum 512 queue'd requests. Hopefully this enough
+	 * space to avoid backpressure on the driver. We also double the size of
+	 * the receive buffer (relative to the send) to ensure a g2h response
+	 * CTB has a landing spot.
 	 */
 
 	/* allocate vma */
-	vma = intel_guc_allocate_vma(guc, PAGE_SIZE);
+	vma = intel_guc_allocate_vma(guc, CTB_DESC_SIZE *
+				     ARRAY_SIZE(ctch->ctbs) +
+				     CTB_H2G_BUFFER_SIZE +
+				     CTB_G2H_BUFFER_SIZE);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_out;
@@ -185,8 +194,9 @@ static int ctch_init(struct intel_guc *guc,
 	/* store pointers to desc and cmds */
 	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
 		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
-		ctch->ctbs[i].desc = blob + PAGE_SIZE/4 * i;
-		ctch->ctbs[i].cmds = blob + PAGE_SIZE/4 * i + PAGE_SIZE/2;
+		ctch->ctbs[i].desc = blob + CTB_DESC_SIZE * i;
+		ctch->ctbs[i].cmds = blob + CTB_H2G_BUFFER_SIZE * i +
+			CTB_DESC_SIZE * ARRAY_SIZE(ctch->ctbs);
 	}
 
 	return 0;
@@ -210,7 +220,7 @@ static void ctch_fini(struct intel_guc *guc,
 static int ctch_enable(struct intel_guc *guc,
 		       struct intel_guc_ct_channel *ctch)
 {
-	u32 base;
+	u32 base, size;
 	int err;
 	int i;
 
@@ -226,9 +236,12 @@ static int ctch_enable(struct intel_guc *guc,
 	 */
 	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
 		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
+		size = (i == CTB_SEND) ? CTB_H2G_BUFFER_SIZE :
+			CTB_G2H_BUFFER_SIZE;
 		guc_ct_buffer_desc_init(&ctch->ctbs[i],
-					base + PAGE_SIZE/4 * i + PAGE_SIZE/2,
-					PAGE_SIZE/4,
+					base + CTB_H2G_BUFFER_SIZE * i +
+					CTB_DESC_SIZE * ARRAY_SIZE(ctch->ctbs),
+					size,
 					ctch->owner);
 	}
 
@@ -236,13 +249,13 @@ static int ctch_enable(struct intel_guc *guc,
 	 * descriptors are in first half of the blob
 	 */
 	err = guc_action_register_ct_buffer(guc,
-					    base + PAGE_SIZE/4 * CTB_RECV,
+					    base + CTB_DESC_SIZE * CTB_RECV,
 					    INTEL_GUC_CT_BUFFER_TYPE_RECV);
 	if (unlikely(err))
 		goto err_out;
 
 	err = guc_action_register_ct_buffer(guc,
-					    base + PAGE_SIZE/4 * CTB_SEND,
+					    base + CTB_DESC_SIZE * CTB_SEND,
 					    INTEL_GUC_CT_BUFFER_TYPE_SEND);
 	if (unlikely(err))
 		goto err_deregister;
@@ -635,7 +648,8 @@ static int ctb_read(struct intel_guc_ct_buffer *ctb, u32 *data)
 	/* beware of buffer wrap case */
 	if (unlikely(available < 0))
 		available += size;
-	CT_DEBUG_DRIVER("CT: available %d (%u:%u)\n", available, head, tail);
+	CT_DEBUG_DRIVER("CT: available %d (%u:%u:%d)\n", available, head, tail,
+			size);
 	GEM_BUG_ON(available < 0);
 
 	data[0] = cmds[head];
-- 
2.21.0.5.gaeb582a983

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Intel-gfx] [PATCH 3/3] drm/i915/guc: Increase size of CTB buffers
@ 2019-11-20 23:56   ` John.C.Harrison
  0 siblings, 0 replies; 36+ messages in thread
From: John.C.Harrison @ 2019-11-20 23:56 UTC (permalink / raw)
  To: Intel-GFX

From: Matthew Brost <matthew.brost@intel.com>

With the introduction of non-blocking CTBs more than one CTB can be in
flight at a time. Increasing the size of the CTBs should reduce how
often software hits the case where no space is available in the CTB
buffer.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 50 +++++++++++++++--------
 1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 4d8a4c6afd71..31c512e7ecc2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -14,6 +14,10 @@
 #define CT_DEBUG_DRIVER(...)	do { } while (0)
 #endif
 
+#define CTB_DESC_SIZE		(PAGE_SIZE / 2)
+#define CTB_H2G_BUFFER_SIZE	(PAGE_SIZE)
+#define CTB_G2H_BUFFER_SIZE	(CTB_H2G_BUFFER_SIZE * 2)
+
 #define MAX_RETRY		0x1000000
 
 struct ct_request {
@@ -143,30 +147,35 @@ static int ctch_init(struct intel_guc *guc,
 
 	GEM_BUG_ON(ctch->vma);
 
-	/* We allocate 1 page to hold both descriptors and both buffers.
+	/* We allocate 3 pages to hold both descriptors and both buffers.
 	 *       ___________.....................
 	 *      |desc (SEND)|                   :
-	 *      |___________|                   PAGE/4
+	 *      |___________|                   PAGE/2
 	 *      :___________....................:
 	 *      |desc (RECV)|                   :
-	 *      |___________|                   PAGE/4
+	 *      |___________|                   PAGE/2
 	 *      :_______________________________:
 	 *      |cmds (SEND)                    |
-	 *      |                               PAGE/4
+	 *      |                               PAGE
 	 *      |_______________________________|
 	 *      |cmds (RECV)                    |
-	 *      |                               PAGE/4
+	 *      |                               PAGE * 2
 	 *      |_______________________________|
 	 *
 	 * Each message can use a maximum of 32 dwords and we don't expect to
-	 * have more than 1 in flight at any time, so we have enough space.
-	 * Some logic further ahead will rely on the fact that there is only 1
-	 * page and that it is always mapped, so if the size is changed the
-	 * other code will need updating as well.
+	 * have more than 1 in flight at any time, unless we are using the GuC
+	 * submission. In that case each request requires a minimum 8 bytes
+	 * which gives us a maximum 512 queue'd requests. Hopefully this enough
+	 * space to avoid backpressure on the driver. We also double the size of
+	 * the receive buffer (relative to the send) to ensure a g2h response
+	 * CTB has a landing spot.
 	 */
 
 	/* allocate vma */
-	vma = intel_guc_allocate_vma(guc, PAGE_SIZE);
+	vma = intel_guc_allocate_vma(guc, CTB_DESC_SIZE *
+				     ARRAY_SIZE(ctch->ctbs) +
+				     CTB_H2G_BUFFER_SIZE +
+				     CTB_G2H_BUFFER_SIZE);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
 		goto err_out;
@@ -185,8 +194,9 @@ static int ctch_init(struct intel_guc *guc,
 	/* store pointers to desc and cmds */
 	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
 		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
-		ctch->ctbs[i].desc = blob + PAGE_SIZE/4 * i;
-		ctch->ctbs[i].cmds = blob + PAGE_SIZE/4 * i + PAGE_SIZE/2;
+		ctch->ctbs[i].desc = blob + CTB_DESC_SIZE * i;
+		ctch->ctbs[i].cmds = blob + CTB_H2G_BUFFER_SIZE * i +
+			CTB_DESC_SIZE * ARRAY_SIZE(ctch->ctbs);
 	}
 
 	return 0;
@@ -210,7 +220,7 @@ static void ctch_fini(struct intel_guc *guc,
 static int ctch_enable(struct intel_guc *guc,
 		       struct intel_guc_ct_channel *ctch)
 {
-	u32 base;
+	u32 base, size;
 	int err;
 	int i;
 
@@ -226,9 +236,12 @@ static int ctch_enable(struct intel_guc *guc,
 	 */
 	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
 		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
+		size = (i == CTB_SEND) ? CTB_H2G_BUFFER_SIZE :
+			CTB_G2H_BUFFER_SIZE;
 		guc_ct_buffer_desc_init(&ctch->ctbs[i],
-					base + PAGE_SIZE/4 * i + PAGE_SIZE/2,
-					PAGE_SIZE/4,
+					base + CTB_H2G_BUFFER_SIZE * i +
+					CTB_DESC_SIZE * ARRAY_SIZE(ctch->ctbs),
+					size,
 					ctch->owner);
 	}
 
@@ -236,13 +249,13 @@ static int ctch_enable(struct intel_guc *guc,
 	 * descriptors are in first half of the blob
 	 */
 	err = guc_action_register_ct_buffer(guc,
-					    base + PAGE_SIZE/4 * CTB_RECV,
+					    base + CTB_DESC_SIZE * CTB_RECV,
 					    INTEL_GUC_CT_BUFFER_TYPE_RECV);
 	if (unlikely(err))
 		goto err_out;
 
 	err = guc_action_register_ct_buffer(guc,
-					    base + PAGE_SIZE/4 * CTB_SEND,
+					    base + CTB_DESC_SIZE * CTB_SEND,
 					    INTEL_GUC_CT_BUFFER_TYPE_SEND);
 	if (unlikely(err))
 		goto err_deregister;
@@ -635,7 +648,8 @@ static int ctb_read(struct intel_guc_ct_buffer *ctb, u32 *data)
 	/* beware of buffer wrap case */
 	if (unlikely(available < 0))
 		available += size;
-	CT_DEBUG_DRIVER("CT: available %d (%u:%u)\n", available, head, tail);
+	CT_DEBUG_DRIVER("CT: available %d (%u:%u:%d)\n", available, head, tail,
+			size);
 	GEM_BUG_ON(available < 0);
 
 	data[0] = cmds[head];
-- 
2.21.0.5.gaeb582a983

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for drm/i915/guc: CTB improvements
@ 2019-11-21  2:49   ` Patchwork
  0 siblings, 0 replies; 36+ messages in thread
From: Patchwork @ 2019-11-21  2:49 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/guc: CTB improvements
URL   : https://patchwork.freedesktop.org/series/69788/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
3db17158f33d drm/i915/guc: Add non blocking CTB send function
-:6: WARNING:TYPO_SPELLING: 'fuction' may be misspelled - perhaps 'function'?
#6: 
Add non blocking CTB send fuction, intel_guc_send_nb. In order to

-:7: WARNING:TYPO_SPELLING: 'fuction' may be misspelled - perhaps 'function'?
#7: 
support a non blocking CTB send fuction a spin lock is needed to

total: 0 errors, 2 warnings, 0 checks, 223 lines checked
6022fc4c10a0 drm/i915/guc: Optimized CTB writes and reads
-:33: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#33: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c:79:
+	ctb->size = desc->size = size;

-:48: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#48: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c:91:
+	ctb->head = desc->head = 0;

-:49: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#49: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c:92:
+	ctb->tail = desc->tail = 0;

-:106: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#106: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c:347:
+	ctb->tail = desc->tail = tail * 4;

-:224: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#224: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c:662:
+	ctb->head = desc->head = head * 4;

total: 0 errors, 0 warnings, 5 checks, 213 lines checked
1c5b53a0ae0e drm/i915/guc: Increase size of CTB buffers

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/guc: CTB improvements
@ 2019-11-21  2:49   ` Patchwork
  0 siblings, 0 replies; 36+ messages in thread
From: Patchwork @ 2019-11-21  2:49 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/guc: CTB improvements
URL   : https://patchwork.freedesktop.org/series/69788/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
3db17158f33d drm/i915/guc: Add non blocking CTB send function
-:6: WARNING:TYPO_SPELLING: 'fuction' may be misspelled - perhaps 'function'?
#6: 
Add non blocking CTB send fuction, intel_guc_send_nb. In order to

-:7: WARNING:TYPO_SPELLING: 'fuction' may be misspelled - perhaps 'function'?
#7: 
support a non blocking CTB send fuction a spin lock is needed to

total: 0 errors, 2 warnings, 0 checks, 223 lines checked
6022fc4c10a0 drm/i915/guc: Optimized CTB writes and reads
-:33: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#33: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c:79:
+	ctb->size = desc->size = size;

-:48: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#48: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c:91:
+	ctb->head = desc->head = 0;

-:49: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#49: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c:92:
+	ctb->tail = desc->tail = 0;

-:106: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#106: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c:347:
+	ctb->tail = desc->tail = tail * 4;

-:224: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#224: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c:662:
+	ctb->head = desc->head = head * 4;

total: 0 errors, 0 warnings, 5 checks, 213 lines checked
1c5b53a0ae0e drm/i915/guc: Increase size of CTB buffers

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915/guc: CTB improvements
@ 2019-11-21  3:16   ` Patchwork
  0 siblings, 0 replies; 36+ messages in thread
From: Patchwork @ 2019-11-21  3:16 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/guc: CTB improvements
URL   : https://patchwork.freedesktop.org/series/69788/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_7394 -> Patchwork_15363
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/index.html

Known issues
------------

  Here are the changes found in Patchwork_15363 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_mmap_gtt@basic-read-write:
    - fi-glk-dsi:         [PASS][1] -> [INCOMPLETE][2] ([fdo#103359] / [k.org#198133])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/fi-glk-dsi/igt@gem_mmap_gtt@basic-read-write.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/fi-glk-dsi/igt@gem_mmap_gtt@basic-read-write.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-icl-guc:         [PASS][3] -> [FAIL][4] ([fdo#103167])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/fi-icl-guc/igt@kms_frontbuffer_tracking@basic.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/fi-icl-guc/igt@kms_frontbuffer_tracking@basic.html

  
#### Possible fixes ####

  * igt@gem_exec_gttfill@basic:
    - {fi-tgl-u}:         [INCOMPLETE][5] ([fdo#111593]) -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/fi-tgl-u/igt@gem_exec_gttfill@basic.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/fi-tgl-u/igt@gem_exec_gttfill@basic.html

  * igt@i915_selftest@live_execlists:
    - fi-whl-u:           [INCOMPLETE][7] ([fdo#112066]) -> [PASS][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/fi-whl-u/igt@i915_selftest@live_execlists.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/fi-whl-u/igt@i915_selftest@live_execlists.html

  * igt@i915_selftest@live_uncore:
    - fi-bxt-dsi:         [INCOMPLETE][9] ([fdo#103927]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/fi-bxt-dsi/igt@i915_selftest@live_uncore.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/fi-bxt-dsi/igt@i915_selftest@live_uncore.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-hsw-peppy:       [DMESG-WARN][11] ([fdo#102614]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/fi-hsw-peppy/igt@kms_frontbuffer_tracking@basic.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/fi-hsw-peppy/igt@kms_frontbuffer_tracking@basic.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#102614]: https://bugs.freedesktop.org/show_bug.cgi?id=102614
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103359]: https://bugs.freedesktop.org/show_bug.cgi?id=103359
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#111593]: https://bugs.freedesktop.org/show_bug.cgi?id=111593
  [fdo#112066]: https://bugs.freedesktop.org/show_bug.cgi?id=112066
  [k.org#198133]: https://bugzilla.kernel.org/show_bug.cgi?id=198133


Participating hosts (51 -> 45)
------------------------------

  Missing    (6): fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_7394 -> Patchwork_15363

  CI-20190529: 20190529
  CI_DRM_7394: fdf3c3d9ba80a629caf1f76952ce619dc3dc8500 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5299: 65fed6a79adea14f7bef6d55530da47d7731d370 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_15363: 1c5b53a0ae0e400024133a668c905bc10175c78f @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

1c5b53a0ae0e drm/i915/guc: Increase size of CTB buffers
6022fc4c10a0 drm/i915/guc: Optimized CTB writes and reads
3db17158f33d drm/i915/guc: Add non blocking CTB send function

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/guc: CTB improvements
@ 2019-11-21  3:16   ` Patchwork
  0 siblings, 0 replies; 36+ messages in thread
From: Patchwork @ 2019-11-21  3:16 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/guc: CTB improvements
URL   : https://patchwork.freedesktop.org/series/69788/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_7394 -> Patchwork_15363
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/index.html

Known issues
------------

  Here are the changes found in Patchwork_15363 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_mmap_gtt@basic-read-write:
    - fi-glk-dsi:         [PASS][1] -> [INCOMPLETE][2] ([fdo#103359] / [k.org#198133])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/fi-glk-dsi/igt@gem_mmap_gtt@basic-read-write.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/fi-glk-dsi/igt@gem_mmap_gtt@basic-read-write.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-icl-guc:         [PASS][3] -> [FAIL][4] ([fdo#103167])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/fi-icl-guc/igt@kms_frontbuffer_tracking@basic.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/fi-icl-guc/igt@kms_frontbuffer_tracking@basic.html

  
#### Possible fixes ####

  * igt@gem_exec_gttfill@basic:
    - {fi-tgl-u}:         [INCOMPLETE][5] ([fdo#111593]) -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/fi-tgl-u/igt@gem_exec_gttfill@basic.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/fi-tgl-u/igt@gem_exec_gttfill@basic.html

  * igt@i915_selftest@live_execlists:
    - fi-whl-u:           [INCOMPLETE][7] ([fdo#112066]) -> [PASS][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/fi-whl-u/igt@i915_selftest@live_execlists.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/fi-whl-u/igt@i915_selftest@live_execlists.html

  * igt@i915_selftest@live_uncore:
    - fi-bxt-dsi:         [INCOMPLETE][9] ([fdo#103927]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/fi-bxt-dsi/igt@i915_selftest@live_uncore.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/fi-bxt-dsi/igt@i915_selftest@live_uncore.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-hsw-peppy:       [DMESG-WARN][11] ([fdo#102614]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/fi-hsw-peppy/igt@kms_frontbuffer_tracking@basic.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/fi-hsw-peppy/igt@kms_frontbuffer_tracking@basic.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#102614]: https://bugs.freedesktop.org/show_bug.cgi?id=102614
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103359]: https://bugs.freedesktop.org/show_bug.cgi?id=103359
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#111593]: https://bugs.freedesktop.org/show_bug.cgi?id=111593
  [fdo#112066]: https://bugs.freedesktop.org/show_bug.cgi?id=112066
  [k.org#198133]: https://bugzilla.kernel.org/show_bug.cgi?id=198133


Participating hosts (51 -> 45)
------------------------------

  Missing    (6): fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_7394 -> Patchwork_15363

  CI-20190529: 20190529
  CI_DRM_7394: fdf3c3d9ba80a629caf1f76952ce619dc3dc8500 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5299: 65fed6a79adea14f7bef6d55530da47d7731d370 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_15363: 1c5b53a0ae0e400024133a668c905bc10175c78f @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

1c5b53a0ae0e drm/i915/guc: Increase size of CTB buffers
6022fc4c10a0 drm/i915/guc: Optimized CTB writes and reads
3db17158f33d drm/i915/guc: Add non blocking CTB send function

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] drm/i915/guc: Add non blocking CTB send function
@ 2019-11-21 11:43     ` Michal Wajdeczko
  0 siblings, 0 replies; 36+ messages in thread
From: Michal Wajdeczko @ 2019-11-21 11:43 UTC (permalink / raw)
  To: Intel-GFX, John.C.Harrison

On Thu, 21 Nov 2019 00:56:02 +0100, <John.C.Harrison@intel.com> wrote:

> From: Matthew Brost <matthew.brost@intel.com>
>
> Add non blocking CTB send fuction, intel_guc_send_nb. In order to
> support a non blocking CTB send fuction a spin lock is needed to

2x typos

> protect the CTB descriptors fields. Also the non blocking call must not
> update the fence value as this value is owned by the blocking call
> (intel_guc_send).

you probably mean "intel_guc_send_ct", as intel_guc_send is just a wrapper
around guc->send

>
> The blocking CTB now must have a flow control mechanism to ensure the
> buffer isn't overrun. A lazy spin wait is used as we believe the flow
> control condition should be rare with properly sized buffer. A retry
> counter is also implemented which fails H2G CTBs once a limit is
> reached to prevent deadlock.
>
> The function, intel_guc_send_nb, is exported in this patch but unused.
> Several patches later in the series make use of this function.

It's likely in yet another series

>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h    |  2 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 97 +++++++++++++++++++----
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 10 ++-
>  3 files changed, 91 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h  
> b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index e6400204a2bd..77c5af919ace 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -94,6 +94,8 @@ intel_guc_send_and_receive(struct intel_guc *guc,  
> const u32 *action, u32 len,
>  	return guc->send(guc, action, len, response_buf, response_buf_size);
>  }
> +int intel_guc_send_nb(struct intel_guc_ct *ct, const u32 *action, u32  
> len);
> +

Hmm, this mismatch of guc/ct parameter breaks the our layering.
But we can keep this layering intact by introducing some flags to
the existing guc_send() function. These flags could be passed as
high bits in action[0], like this:

#define GUC_ACTION_FLAG_DONT_WAIT 0x80000000

int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset)
{
	u32 action[] = {
		INTEL_GUC_ACTION_AUTHENTICATE_HUC | GUC_ACTION_FLAG_DONT_WAIT,
		rsa_offset
	};

	return intel_guc_send(guc, action, ARRAY_SIZE(action));
}

then actual back-end of guc->send can take proper steps based on this flag:

@@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action, u32  
len,
         GEM_BUG_ON(!len);
         GEM_BUG_ON(len > guc->send_regs.count);

+       if (*action & GUC_ACTION_FLAG_DONT_WAIT)
+               return -EINVAL;
+       *action &= ~GUC_ACTION_FLAG_DONT_WAIT;
+
         /* We expect only action code */
         GEM_BUG_ON(*action & ~INTEL_GUC_MSG_CODE_MASK);

@@ @@ int intel_guc_send_ct(struct intel_guc *guc, const u32 *action, u32  
len,
         u32 status = ~0; /* undefined */
         int ret;

+       if (*action & GUC_ACTION_FLAG_DONT_WAIT) {
+               GEM_BUG_ON(response_buf);
+               GEM_BUG_ON(response_buf_size);
+               return ctch_send_nb(ct, ctch, action, len);
+       }
+
         mutex_lock(&guc->send_mutex);

         ret = ctch_send(ct, ctch, action, len, response_buf,  
response_buf_size,


>  static inline void intel_guc_notify(struct intel_guc *guc)
>  {
>  	guc->notify(guc);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c  
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index b49115517510..e50d968b15d5 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -3,6 +3,8 @@
>   * Copyright © 2016-2019 Intel Corporation
>   */
> +#include <linux/circ_buf.h>
> +
>  #include "i915_drv.h"
>  #include "intel_guc_ct.h"
> @@ -12,6 +14,8 @@
>  #define CT_DEBUG_DRIVER(...)	do { } while (0)
>  #endif
> +#define MAX_RETRY		0x1000000
> +
>  struct ct_request {
>  	struct list_head link;
>  	u32 fence;
> @@ -40,7 +44,8 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>  	/* we're using static channel owners */
>  	ct->host_channel.owner = CTB_OWNER_HOST;
> -	spin_lock_init(&ct->lock);
> +	spin_lock_init(&ct->request_lock);
> +	spin_lock_init(&ct->send_lock);
>  	INIT_LIST_HEAD(&ct->pending_requests);
>  	INIT_LIST_HEAD(&ct->incoming_requests);
>  	INIT_WORK(&ct->worker, ct_incoming_request_worker_func);
> @@ -291,7 +296,8 @@ static u32 ctch_get_next_fence(struct  
> intel_guc_ct_channel *ctch)
>  static int ctb_write(struct intel_guc_ct_buffer *ctb,
>  		     const u32 *action,
>  		     u32 len /* in dwords */,
> -		     u32 fence,
> +		     u32 fence_value,
> +		     bool enable_fence,

maybe we can just guarantee that fence=0 will never be used as a valid
fence id, then this flag could be replaced with (fence != 0) check.

>  		     bool want_response)
>  {
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> @@ -328,18 +334,18 @@ static int ctb_write(struct intel_guc_ct_buffer  
> *ctb,
>  	 * DW2+: action data
>  	 */
>  	header = (len << GUC_CT_MSG_LEN_SHIFT) |
> -		 (GUC_CT_MSG_WRITE_FENCE_TO_DESC) |
> +		 (enable_fence ? GUC_CT_MSG_WRITE_FENCE_TO_DESC : 0) |

Hmm, even if we ask fw to do not write back fence to the descriptor,
IIRC current firmware will unconditionally write back return status
of this non-blocking call, possibly overwriting status of the blocked
call.

>  		 (want_response ? GUC_CT_MSG_SEND_STATUS : 0) |

btw, if we switch all requests to expect reply send back over CTB,
then we can possibly drop the send_mutex in CTB paths, and block
only when there is no DONT_WAIT flag and we have to wait for response.

>  		 (action[0] << GUC_CT_MSG_ACTION_SHIFT);
> 	CT_DEBUG_DRIVER("CT: writing %*ph %*ph %*ph\n",
> -			4, &header, 4, &fence,
> +			4, &header, 4, &fence_value,
>  			4 * (len - 1), &action[1]);
> 	cmds[tail] = header;
>  	tail = (tail + 1) % size;
> -	cmds[tail] = fence;
> +	cmds[tail] = fence_value;
>  	tail = (tail + 1) % size;
> 	for (i = 1; i < len; i++) {
> @@ -440,6 +446,47 @@ static int wait_for_ct_request_update(struct  
> ct_request *req, u32 *status)
>  	return err;
>  }
> +static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, u32  
> len)
> +{
> +	u32 head = READ_ONCE(desc->head);
> +	u32 space;
> +
> +	space = CIRC_SPACE(desc->tail, head, desc->size);
> +
> +	return space >= len;
> +}
> +
> +int intel_guc_send_nb(struct intel_guc_ct *ct,
> +		      const u32 *action,
> +		      u32 len)
> +{
> +	struct intel_guc_ct_channel *ctch = &ct->host_channel;
> +	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
> +	struct guc_ct_buffer_desc *desc = ctb->desc;
> +	int err;
> +
> +	GEM_BUG_ON(!ctch->enabled);
> +	GEM_BUG_ON(!len);
> +	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> +	lockdep_assert_held(&ct->send_lock);

hmm, does it mean that now it's caller responsibility to spinlock
on CT private lock ? That is not how other guc_send() functions work.

> +
> +	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
> +		ct->retry++;
> +		if (ct->retry >= MAX_RETRY)
> +			return -EDEADLK;
> +		else
> +			return -EBUSY;
> +	}
> +
> +	ct->retry = 0;
> +	err = ctb_write(ctb, action, len, 0, false, false);
> +	if (unlikely(err))
> +		return err;
> +
> +	intel_guc_notify(ct_to_guc(ct));
> +	return 0;
> +}
> +
>  static int ctch_send(struct intel_guc_ct *ct,
>  		     struct intel_guc_ct_channel *ctch,
>  		     const u32 *action,
> @@ -460,17 +507,35 @@ static int ctch_send(struct intel_guc_ct *ct,
>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>  	GEM_BUG_ON(!response_buf && response_buf_size);
> +	/*
> +	 * We use a lazy spin wait loop here as we believe that if the CT
> +	 * buffers are sized correctly the flow control condition should be
> +	 * rare.
> +	 */
> +retry:
> +	spin_lock_irqsave(&ct->send_lock, flags);
> +	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
> +		spin_unlock_irqrestore(&ct->send_lock, flags);
> +		ct->retry++;
> +		if (ct->retry >= MAX_RETRY)
> +			return -EDEADLK;

I'm not sure what's better: have secret deadlock hard to reproduce,
or deadlocks easier to catch that helps improve to be deadlock-clean

> +		cpu_relax();
> +		goto retry;
> +	}
> +
> +	ct->retry = 0;
>  	fence = ctch_get_next_fence(ctch);
>  	request.fence = fence;
>  	request.status = 0;
>  	request.response_len = response_buf_size;
>  	request.response_buf = response_buf;
> -	spin_lock_irqsave(&ct->lock, flags);
> +	spin_lock(&ct->request_lock);
>  	list_add_tail(&request.link, &ct->pending_requests);
> -	spin_unlock_irqrestore(&ct->lock, flags);
> +	spin_unlock(&ct->request_lock);
> -	err = ctb_write(ctb, action, len, fence, !!response_buf);
> +	err = ctb_write(ctb, action, len, fence, true, !!response_buf);
> +	spin_unlock_irqrestore(&ct->send_lock, flags);
>  	if (unlikely(err))
>  		goto unlink;
> @@ -501,9 +566,9 @@ static int ctch_send(struct intel_guc_ct *ct,
>  	}
> unlink:
> -	spin_lock_irqsave(&ct->lock, flags);
> +	spin_lock_irqsave(&ct->request_lock, flags);
>  	list_del(&request.link);
> -	spin_unlock_irqrestore(&ct->lock, flags);
> +	spin_unlock_irqrestore(&ct->request_lock, flags);
> 	return err;
>  }
> @@ -653,7 +718,7 @@ static int ct_handle_response(struct intel_guc_ct  
> *ct, const u32 *msg)
> 	CT_DEBUG_DRIVER("CT: response fence %u status %#x\n", fence, status);
> -	spin_lock(&ct->lock);
> +	spin_lock(&ct->request_lock);
>  	list_for_each_entry(req, &ct->pending_requests, link) {
>  		if (unlikely(fence != req->fence)) {
>  			CT_DEBUG_DRIVER("CT: request %u awaits response\n",
> @@ -672,7 +737,7 @@ static int ct_handle_response(struct intel_guc_ct  
> *ct, const u32 *msg)
>  		found = true;
>  		break;
>  	}
> -	spin_unlock(&ct->lock);
> +	spin_unlock(&ct->request_lock);
> 	if (!found)
>  		DRM_ERROR("CT: unsolicited response %*ph\n", 4 * msglen, msg);
> @@ -710,13 +775,13 @@ static bool ct_process_incoming_requests(struct  
> intel_guc_ct *ct)
>  	u32 *payload;
>  	bool done;
> -	spin_lock_irqsave(&ct->lock, flags);
> +	spin_lock_irqsave(&ct->request_lock, flags);
>  	request = list_first_entry_or_null(&ct->incoming_requests,
>  					   struct ct_incoming_request, link);
>  	if (request)
>  		list_del(&request->link);
>  	done = !!list_empty(&ct->incoming_requests);
> -	spin_unlock_irqrestore(&ct->lock, flags);
> +	spin_unlock_irqrestore(&ct->request_lock, flags);
> 	if (!request)
>  		return true;
> @@ -777,9 +842,9 @@ static int ct_handle_request(struct intel_guc_ct  
> *ct, const u32 *msg)
>  	}
>  	memcpy(request->msg, msg, 4 * msglen);
> -	spin_lock_irqsave(&ct->lock, flags);
> +	spin_lock_irqsave(&ct->request_lock, flags);
>  	list_add_tail(&request->link, &ct->incoming_requests);
> -	spin_unlock_irqrestore(&ct->lock, flags);
> +	spin_unlock_irqrestore(&ct->request_lock, flags);
> 	queue_work(system_unbound_wq, &ct->worker);
>  	return 0;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h  
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index 7c24d83f5c24..bc670a796bd8 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -62,8 +62,11 @@ struct intel_guc_ct {
>  	struct intel_guc_ct_channel host_channel;
>  	/* other channels are tbd */
> -	/** @lock: protects pending requests list */
> -	spinlock_t lock;
> +	/** @request_lock: protects pending requests list */
> +	spinlock_t request_lock;
> +
> +	/** @send_lock: protects h2g channel */
> +	spinlock_t send_lock;
> 	/** @pending_requests: list of requests waiting for response */
>  	struct list_head pending_requests;
> @@ -73,6 +76,9 @@ struct intel_guc_ct {
> 	/** @worker: worker for handling incoming requests */
>  	struct work_struct worker;
> +
> +	/** @retry: the number of times a H2G CTB has been retried */
> +	u32 retry;
>  };
> void intel_guc_ct_init_early(struct intel_guc_ct *ct);
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Add non blocking CTB send function
@ 2019-11-21 11:43     ` Michal Wajdeczko
  0 siblings, 0 replies; 36+ messages in thread
From: Michal Wajdeczko @ 2019-11-21 11:43 UTC (permalink / raw)
  To: Intel-GFX, John.C.Harrison

On Thu, 21 Nov 2019 00:56:02 +0100, <John.C.Harrison@intel.com> wrote:

> From: Matthew Brost <matthew.brost@intel.com>
>
> Add non blocking CTB send fuction, intel_guc_send_nb. In order to
> support a non blocking CTB send fuction a spin lock is needed to

2x typos

> protect the CTB descriptors fields. Also the non blocking call must not
> update the fence value as this value is owned by the blocking call
> (intel_guc_send).

you probably mean "intel_guc_send_ct", as intel_guc_send is just a wrapper
around guc->send

>
> The blocking CTB now must have a flow control mechanism to ensure the
> buffer isn't overrun. A lazy spin wait is used as we believe the flow
> control condition should be rare with properly sized buffer. A retry
> counter is also implemented which fails H2G CTBs once a limit is
> reached to prevent deadlock.
>
> The function, intel_guc_send_nb, is exported in this patch but unused.
> Several patches later in the series make use of this function.

It's likely in yet another series

>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h    |  2 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 97 +++++++++++++++++++----
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 10 ++-
>  3 files changed, 91 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h  
> b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index e6400204a2bd..77c5af919ace 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -94,6 +94,8 @@ intel_guc_send_and_receive(struct intel_guc *guc,  
> const u32 *action, u32 len,
>  	return guc->send(guc, action, len, response_buf, response_buf_size);
>  }
> +int intel_guc_send_nb(struct intel_guc_ct *ct, const u32 *action, u32  
> len);
> +

Hmm, this mismatch of guc/ct parameter breaks the our layering.
But we can keep this layering intact by introducing some flags to
the existing guc_send() function. These flags could be passed as
high bits in action[0], like this:

#define GUC_ACTION_FLAG_DONT_WAIT 0x80000000

int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset)
{
	u32 action[] = {
		INTEL_GUC_ACTION_AUTHENTICATE_HUC | GUC_ACTION_FLAG_DONT_WAIT,
		rsa_offset
	};

	return intel_guc_send(guc, action, ARRAY_SIZE(action));
}

then actual back-end of guc->send can take proper steps based on this flag:

@@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action, u32  
len,
         GEM_BUG_ON(!len);
         GEM_BUG_ON(len > guc->send_regs.count);

+       if (*action & GUC_ACTION_FLAG_DONT_WAIT)
+               return -EINVAL;
+       *action &= ~GUC_ACTION_FLAG_DONT_WAIT;
+
         /* We expect only action code */
         GEM_BUG_ON(*action & ~INTEL_GUC_MSG_CODE_MASK);

@@ @@ int intel_guc_send_ct(struct intel_guc *guc, const u32 *action, u32  
len,
         u32 status = ~0; /* undefined */
         int ret;

+       if (*action & GUC_ACTION_FLAG_DONT_WAIT) {
+               GEM_BUG_ON(response_buf);
+               GEM_BUG_ON(response_buf_size);
+               return ctch_send_nb(ct, ctch, action, len);
+       }
+
         mutex_lock(&guc->send_mutex);

         ret = ctch_send(ct, ctch, action, len, response_buf,  
response_buf_size,


>  static inline void intel_guc_notify(struct intel_guc *guc)
>  {
>  	guc->notify(guc);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c  
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index b49115517510..e50d968b15d5 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -3,6 +3,8 @@
>   * Copyright © 2016-2019 Intel Corporation
>   */
> +#include <linux/circ_buf.h>
> +
>  #include "i915_drv.h"
>  #include "intel_guc_ct.h"
> @@ -12,6 +14,8 @@
>  #define CT_DEBUG_DRIVER(...)	do { } while (0)
>  #endif
> +#define MAX_RETRY		0x1000000
> +
>  struct ct_request {
>  	struct list_head link;
>  	u32 fence;
> @@ -40,7 +44,8 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>  	/* we're using static channel owners */
>  	ct->host_channel.owner = CTB_OWNER_HOST;
> -	spin_lock_init(&ct->lock);
> +	spin_lock_init(&ct->request_lock);
> +	spin_lock_init(&ct->send_lock);
>  	INIT_LIST_HEAD(&ct->pending_requests);
>  	INIT_LIST_HEAD(&ct->incoming_requests);
>  	INIT_WORK(&ct->worker, ct_incoming_request_worker_func);
> @@ -291,7 +296,8 @@ static u32 ctch_get_next_fence(struct  
> intel_guc_ct_channel *ctch)
>  static int ctb_write(struct intel_guc_ct_buffer *ctb,
>  		     const u32 *action,
>  		     u32 len /* in dwords */,
> -		     u32 fence,
> +		     u32 fence_value,
> +		     bool enable_fence,

maybe we can just guarantee that fence=0 will never be used as a valid
fence id, then this flag could be replaced with (fence != 0) check.

>  		     bool want_response)
>  {
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> @@ -328,18 +334,18 @@ static int ctb_write(struct intel_guc_ct_buffer  
> *ctb,
>  	 * DW2+: action data
>  	 */
>  	header = (len << GUC_CT_MSG_LEN_SHIFT) |
> -		 (GUC_CT_MSG_WRITE_FENCE_TO_DESC) |
> +		 (enable_fence ? GUC_CT_MSG_WRITE_FENCE_TO_DESC : 0) |

Hmm, even if we ask fw to do not write back fence to the descriptor,
IIRC current firmware will unconditionally write back return status
of this non-blocking call, possibly overwriting status of the blocked
call.

>  		 (want_response ? GUC_CT_MSG_SEND_STATUS : 0) |

btw, if we switch all requests to expect reply send back over CTB,
then we can possibly drop the send_mutex in CTB paths, and block
only when there is no DONT_WAIT flag and we have to wait for response.

>  		 (action[0] << GUC_CT_MSG_ACTION_SHIFT);
> 	CT_DEBUG_DRIVER("CT: writing %*ph %*ph %*ph\n",
> -			4, &header, 4, &fence,
> +			4, &header, 4, &fence_value,
>  			4 * (len - 1), &action[1]);
> 	cmds[tail] = header;
>  	tail = (tail + 1) % size;
> -	cmds[tail] = fence;
> +	cmds[tail] = fence_value;
>  	tail = (tail + 1) % size;
> 	for (i = 1; i < len; i++) {
> @@ -440,6 +446,47 @@ static int wait_for_ct_request_update(struct  
> ct_request *req, u32 *status)
>  	return err;
>  }
> +static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, u32  
> len)
> +{
> +	u32 head = READ_ONCE(desc->head);
> +	u32 space;
> +
> +	space = CIRC_SPACE(desc->tail, head, desc->size);
> +
> +	return space >= len;
> +}
> +
> +int intel_guc_send_nb(struct intel_guc_ct *ct,
> +		      const u32 *action,
> +		      u32 len)
> +{
> +	struct intel_guc_ct_channel *ctch = &ct->host_channel;
> +	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
> +	struct guc_ct_buffer_desc *desc = ctb->desc;
> +	int err;
> +
> +	GEM_BUG_ON(!ctch->enabled);
> +	GEM_BUG_ON(!len);
> +	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
> +	lockdep_assert_held(&ct->send_lock);

hmm, does it mean that now it's caller responsibility to spinlock
on CT private lock ? That is not how other guc_send() functions work.

> +
> +	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
> +		ct->retry++;
> +		if (ct->retry >= MAX_RETRY)
> +			return -EDEADLK;
> +		else
> +			return -EBUSY;
> +	}
> +
> +	ct->retry = 0;
> +	err = ctb_write(ctb, action, len, 0, false, false);
> +	if (unlikely(err))
> +		return err;
> +
> +	intel_guc_notify(ct_to_guc(ct));
> +	return 0;
> +}
> +
>  static int ctch_send(struct intel_guc_ct *ct,
>  		     struct intel_guc_ct_channel *ctch,
>  		     const u32 *action,
> @@ -460,17 +507,35 @@ static int ctch_send(struct intel_guc_ct *ct,
>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>  	GEM_BUG_ON(!response_buf && response_buf_size);
> +	/*
> +	 * We use a lazy spin wait loop here as we believe that if the CT
> +	 * buffers are sized correctly the flow control condition should be
> +	 * rare.
> +	 */
> +retry:
> +	spin_lock_irqsave(&ct->send_lock, flags);
> +	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
> +		spin_unlock_irqrestore(&ct->send_lock, flags);
> +		ct->retry++;
> +		if (ct->retry >= MAX_RETRY)
> +			return -EDEADLK;

I'm not sure what's better: have secret deadlock hard to reproduce,
or deadlocks easier to catch that helps improve to be deadlock-clean

> +		cpu_relax();
> +		goto retry;
> +	}
> +
> +	ct->retry = 0;
>  	fence = ctch_get_next_fence(ctch);
>  	request.fence = fence;
>  	request.status = 0;
>  	request.response_len = response_buf_size;
>  	request.response_buf = response_buf;
> -	spin_lock_irqsave(&ct->lock, flags);
> +	spin_lock(&ct->request_lock);
>  	list_add_tail(&request.link, &ct->pending_requests);
> -	spin_unlock_irqrestore(&ct->lock, flags);
> +	spin_unlock(&ct->request_lock);
> -	err = ctb_write(ctb, action, len, fence, !!response_buf);
> +	err = ctb_write(ctb, action, len, fence, true, !!response_buf);
> +	spin_unlock_irqrestore(&ct->send_lock, flags);
>  	if (unlikely(err))
>  		goto unlink;
> @@ -501,9 +566,9 @@ static int ctch_send(struct intel_guc_ct *ct,
>  	}
> unlink:
> -	spin_lock_irqsave(&ct->lock, flags);
> +	spin_lock_irqsave(&ct->request_lock, flags);
>  	list_del(&request.link);
> -	spin_unlock_irqrestore(&ct->lock, flags);
> +	spin_unlock_irqrestore(&ct->request_lock, flags);
> 	return err;
>  }
> @@ -653,7 +718,7 @@ static int ct_handle_response(struct intel_guc_ct  
> *ct, const u32 *msg)
> 	CT_DEBUG_DRIVER("CT: response fence %u status %#x\n", fence, status);
> -	spin_lock(&ct->lock);
> +	spin_lock(&ct->request_lock);
>  	list_for_each_entry(req, &ct->pending_requests, link) {
>  		if (unlikely(fence != req->fence)) {
>  			CT_DEBUG_DRIVER("CT: request %u awaits response\n",
> @@ -672,7 +737,7 @@ static int ct_handle_response(struct intel_guc_ct  
> *ct, const u32 *msg)
>  		found = true;
>  		break;
>  	}
> -	spin_unlock(&ct->lock);
> +	spin_unlock(&ct->request_lock);
> 	if (!found)
>  		DRM_ERROR("CT: unsolicited response %*ph\n", 4 * msglen, msg);
> @@ -710,13 +775,13 @@ static bool ct_process_incoming_requests(struct  
> intel_guc_ct *ct)
>  	u32 *payload;
>  	bool done;
> -	spin_lock_irqsave(&ct->lock, flags);
> +	spin_lock_irqsave(&ct->request_lock, flags);
>  	request = list_first_entry_or_null(&ct->incoming_requests,
>  					   struct ct_incoming_request, link);
>  	if (request)
>  		list_del(&request->link);
>  	done = !!list_empty(&ct->incoming_requests);
> -	spin_unlock_irqrestore(&ct->lock, flags);
> +	spin_unlock_irqrestore(&ct->request_lock, flags);
> 	if (!request)
>  		return true;
> @@ -777,9 +842,9 @@ static int ct_handle_request(struct intel_guc_ct  
> *ct, const u32 *msg)
>  	}
>  	memcpy(request->msg, msg, 4 * msglen);
> -	spin_lock_irqsave(&ct->lock, flags);
> +	spin_lock_irqsave(&ct->request_lock, flags);
>  	list_add_tail(&request->link, &ct->incoming_requests);
> -	spin_unlock_irqrestore(&ct->lock, flags);
> +	spin_unlock_irqrestore(&ct->request_lock, flags);
> 	queue_work(system_unbound_wq, &ct->worker);
>  	return 0;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h  
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index 7c24d83f5c24..bc670a796bd8 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -62,8 +62,11 @@ struct intel_guc_ct {
>  	struct intel_guc_ct_channel host_channel;
>  	/* other channels are tbd */
> -	/** @lock: protects pending requests list */
> -	spinlock_t lock;
> +	/** @request_lock: protects pending requests list */
> +	spinlock_t request_lock;
> +
> +	/** @send_lock: protects h2g channel */
> +	spinlock_t send_lock;
> 	/** @pending_requests: list of requests waiting for response */
>  	struct list_head pending_requests;
> @@ -73,6 +76,9 @@ struct intel_guc_ct {
> 	/** @worker: worker for handling incoming requests */
>  	struct work_struct worker;
> +
> +	/** @retry: the number of times a H2G CTB has been retried */
> +	u32 retry;
>  };
> void intel_guc_ct_init_early(struct intel_guc_ct *ct);
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] drm/i915/guc: Optimized CTB writes and reads
@ 2019-11-21 11:58     ` Michal Wajdeczko
  0 siblings, 0 replies; 36+ messages in thread
From: Michal Wajdeczko @ 2019-11-21 11:58 UTC (permalink / raw)
  To: Intel-GFX, John.C.Harrison

On Thu, 21 Nov 2019 00:56:03 +0100, <John.C.Harrison@intel.com> wrote:

> From: Matthew Brost <matthew.brost@intel.com>
>
> CTB writes are now in the path of command submission and should be
> optimized for performance. Rather than reading CTB descriptor values
> (e.g. head, tail, size) which could result in accesses across the PCIe
> bus, store shadow local copies and only read/write the descriptor
> values when absolutely necessary.
>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 79 +++++++++++------------
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  8 +++
>  2 files changed, 45 insertions(+), 42 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c  
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index e50d968b15d5..4d8a4c6afd71 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -68,23 +68,29 @@ static inline const char  
> *guc_ct_buffer_type_to_str(u32 type)
>  	}
>  }
> -static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
> +static void guc_ct_buffer_desc_init(struct intel_guc_ct_buffer *ctb,
>  				    u32 cmds_addr, u32 size, u32 owner)

as now this function takes ctb instead of desc, it should be renamed
or make it separate from guc_ct_buffer_desc_init

>  {
> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>  	CT_DEBUG_DRIVER("CT: desc %p init addr=%#x size=%u owner=%u\n",
>  			desc, cmds_addr, size, owner);
>  	memset(desc, 0, sizeof(*desc));
>  	desc->addr = cmds_addr;
> -	desc->size = size;
> +	ctb->size = desc->size = size;
>  	desc->owner = owner;
> +	ctb->tail = 0;
> +	ctb->head = 0;
> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>  }
> -static void guc_ct_buffer_desc_reset(struct guc_ct_buffer_desc *desc)
> +static void guc_ct_buffer_desc_reset(struct intel_guc_ct_buffer *ctb)

same here

>  {
> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>  	CT_DEBUG_DRIVER("CT: desc %p reset head=%u tail=%u\n",
>  			desc, desc->head, desc->tail);
> -	desc->head = 0;
> -	desc->tail = 0;
> +	ctb->head = desc->head = 0;
> +	ctb->tail = desc->tail = 0;
> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>  	desc->is_in_error = 0;
>  }
> @@ -220,7 +226,7 @@ static int ctch_enable(struct intel_guc *guc,
>  	 */
>  	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
>  		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
> -		guc_ct_buffer_desc_init(ctch->ctbs[i].desc,
> +		guc_ct_buffer_desc_init(&ctch->ctbs[i],
>  					base + PAGE_SIZE/4 * i + PAGE_SIZE/2,
>  					PAGE_SIZE/4,
>  					ctch->owner);
> @@ -301,32 +307,16 @@ static int ctb_write(struct intel_guc_ct_buffer  
> *ctb,
>  		     bool want_response)
>  {
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head / 4;	/* in dwords */
> -	u32 tail = desc->tail / 4;	/* in dwords */
> -	u32 size = desc->size / 4;	/* in dwords */
> -	u32 used;			/* in dwords */
> +	u32 tail = ctb->tail / 4;	/* in dwords */
> +	u32 size = ctb->size / 4;	/* in dwords */
>  	u32 header;
>  	u32 *cmds = ctb->cmds;
>  	unsigned int i;
> -	GEM_BUG_ON(desc->size % 4);
> -	GEM_BUG_ON(desc->head % 4);
> -	GEM_BUG_ON(desc->tail % 4);
> +	GEM_BUG_ON(ctb->size % 4);
> +	GEM_BUG_ON(ctb->tail % 4);
>  	GEM_BUG_ON(tail >= size);
> -	/*
> -	 * tail == head condition indicates empty. GuC FW does not support
> -	 * using up the entire buffer to get tail == head meaning full.
> -	 */
> -	if (tail < head)
> -		used = (size - head) + tail;
> -	else
> -		used = tail - head;
> -
> -	/* make sure there is a space including extra dw for the fence */
> -	if (unlikely(used + len + 1 >= size))
> -		return -ENOSPC;
> -
>  	/*
>  	 * Write the message. The format is the following:
>  	 * DW0: header (including action code)
> @@ -354,15 +344,16 @@ static int ctb_write(struct intel_guc_ct_buffer  
> *ctb,
>  	}
> 	/* now update desc tail (back in bytes) */
> -	desc->tail = tail * 4;
> -	GEM_BUG_ON(desc->tail > desc->size);
> +	ctb->tail = desc->tail = tail * 4;
> +	ctb->space -= (len + 1) * 4;
> +	GEM_BUG_ON(ctb->tail > ctb->size);
> 	return 0;
>  }
> /**
>   * wait_for_ctb_desc_update - Wait for the CT buffer descriptor update.
> - * @desc:	buffer descriptor
> + * @ctb:	ctb buffer
>   * @fence:	response fence
>   * @status:	placeholder for status
>   *
> @@ -376,11 +367,12 @@ static int ctb_write(struct intel_guc_ct_buffer  
> *ctb,
>   * *	-ETIMEDOUT no response within hardcoded timeout
>   * *	-EPROTO no response, CT buffer is in error
>   */
> -static int wait_for_ctb_desc_update(struct guc_ct_buffer_desc *desc,
> +static int wait_for_ctb_desc_update(struct intel_guc_ct_buffer *ctb,
>  				    u32 fence,
>  				    u32 *status)
>  {
>  	int err;
> +	struct guc_ct_buffer_desc *desc = ctb->desc;
> 	/*
>  	 * Fast commands should complete in less than 10us, so sample quickly
> @@ -401,7 +393,7 @@ static int wait_for_ctb_desc_update(struct  
> guc_ct_buffer_desc *desc,
>  			/* Something went wrong with the messaging, try to reset
>  			 * the buffer and hope for the best
>  			 */
> -			guc_ct_buffer_desc_reset(desc);
> +			guc_ct_buffer_desc_reset(ctb);
>  			err = -EPROTO;
>  		}
>  	}
> @@ -446,12 +438,17 @@ static int wait_for_ct_request_update(struct  
> ct_request *req, u32 *status)
>  	return err;
>  }
> -static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, u32  
> len)
> +static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32  
> len)
>  {
> -	u32 head = READ_ONCE(desc->head);
> +	u32 head;
>  	u32 space;
> -	space = CIRC_SPACE(desc->tail, head, desc->size);
> +	if (ctb->space >= len)
> +		return true;
> +
> +	head = READ_ONCE(ctb->desc->head);
> +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> +	ctb->space = space;
> 	return space >= len;
>  }
> @@ -462,7 +459,6 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
>  {
>  	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>  	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
> -	struct guc_ct_buffer_desc *desc = ctb->desc;
>  	int err;
> 	GEM_BUG_ON(!ctch->enabled);
> @@ -470,7 +466,7 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>  	lockdep_assert_held(&ct->send_lock);
> -	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
> +	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
>  		ct->retry++;
>  		if (ct->retry >= MAX_RETRY)
>  			return -EDEADLK;
> @@ -496,7 +492,6 @@ static int ctch_send(struct intel_guc_ct *ct,
>  		     u32 *status)
>  {
>  	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
> -	struct guc_ct_buffer_desc *desc = ctb->desc;
>  	struct ct_request request;
>  	unsigned long flags;
>  	u32 fence;
> @@ -514,7 +509,7 @@ static int ctch_send(struct intel_guc_ct *ct,
>  	 */
>  retry:
>  	spin_lock_irqsave(&ct->send_lock, flags);
> -	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
> +	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
>  		spin_unlock_irqrestore(&ct->send_lock, flags);
>  		ct->retry++;
>  		if (ct->retry >= MAX_RETRY)
> @@ -544,7 +539,7 @@ static int ctch_send(struct intel_guc_ct *ct,
>  	if (response_buf)
>  		err = wait_for_ct_request_update(&request, status);
>  	else
> -		err = wait_for_ctb_desc_update(desc, fence, status);
> +		err = wait_for_ctb_desc_update(ctb, fence, status);
>  	if (unlikely(err))
>  		goto unlink;
> @@ -618,9 +613,9 @@ static inline bool ct_header_is_response(u32 header)
>  static int ctb_read(struct intel_guc_ct_buffer *ctb, u32 *data)
>  {
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head / 4;	/* in dwords */
> +	u32 head = ctb->head / 4;	/* in dwords */
>  	u32 tail = desc->tail / 4;	/* in dwords */
> -	u32 size = desc->size / 4;	/* in dwords */
> +	u32 size = ctb->size / 4;	/* in dwords */
>  	u32 *cmds = ctb->cmds;
>  	s32 available;			/* in dwords */
>  	unsigned int len;
> @@ -664,7 +659,7 @@ static int ctb_read(struct intel_guc_ct_buffer *ctb,  
> u32 *data)
>  	}
>  	CT_DEBUG_DRIVER("CT: received %*ph\n", 4 * len, data);
> -	desc->head = head * 4;
> +	ctb->head = desc->head = head * 4;
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h  
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index bc670a796bd8..1bff4f0b91f7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -29,10 +29,18 @@ struct intel_guc;
>   *
>   * @desc: pointer to the buffer descriptor
>   * @cmds: pointer to the commands buffer
> + * @size: local shadow copy of size

I would rather expect this as real fixed size,
note that size is not expected to change

> + * @head: local shadow copy of head
> + * @tail: local shadow copy of tail
> + * @space: local shadow copy of space
>   */
>  struct intel_guc_ct_buffer {
>  	struct guc_ct_buffer_desc *desc;
>  	u32 *cmds;
> +	u32 size;
> +	u32 tail;
> +	u32 head;
> +	u32 space;
>  };
> /** Represents pair of command transport buffers.

Can we reorder this patch to be first in the series ?

Michal
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Intel-gfx] [PATCH 2/3] drm/i915/guc: Optimized CTB writes and reads
@ 2019-11-21 11:58     ` Michal Wajdeczko
  0 siblings, 0 replies; 36+ messages in thread
From: Michal Wajdeczko @ 2019-11-21 11:58 UTC (permalink / raw)
  To: Intel-GFX, John.C.Harrison

On Thu, 21 Nov 2019 00:56:03 +0100, <John.C.Harrison@intel.com> wrote:

> From: Matthew Brost <matthew.brost@intel.com>
>
> CTB writes are now in the path of command submission and should be
> optimized for performance. Rather than reading CTB descriptor values
> (e.g. head, tail, size) which could result in accesses across the PCIe
> bus, store shadow local copies and only read/write the descriptor
> values when absolutely necessary.
>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 79 +++++++++++------------
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  8 +++
>  2 files changed, 45 insertions(+), 42 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c  
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index e50d968b15d5..4d8a4c6afd71 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -68,23 +68,29 @@ static inline const char  
> *guc_ct_buffer_type_to_str(u32 type)
>  	}
>  }
> -static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
> +static void guc_ct_buffer_desc_init(struct intel_guc_ct_buffer *ctb,
>  				    u32 cmds_addr, u32 size, u32 owner)

as now this function takes ctb instead of desc, it should be renamed
or make it separate from guc_ct_buffer_desc_init

>  {
> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>  	CT_DEBUG_DRIVER("CT: desc %p init addr=%#x size=%u owner=%u\n",
>  			desc, cmds_addr, size, owner);
>  	memset(desc, 0, sizeof(*desc));
>  	desc->addr = cmds_addr;
> -	desc->size = size;
> +	ctb->size = desc->size = size;
>  	desc->owner = owner;
> +	ctb->tail = 0;
> +	ctb->head = 0;
> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>  }
> -static void guc_ct_buffer_desc_reset(struct guc_ct_buffer_desc *desc)
> +static void guc_ct_buffer_desc_reset(struct intel_guc_ct_buffer *ctb)

same here

>  {
> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>  	CT_DEBUG_DRIVER("CT: desc %p reset head=%u tail=%u\n",
>  			desc, desc->head, desc->tail);
> -	desc->head = 0;
> -	desc->tail = 0;
> +	ctb->head = desc->head = 0;
> +	ctb->tail = desc->tail = 0;
> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>  	desc->is_in_error = 0;
>  }
> @@ -220,7 +226,7 @@ static int ctch_enable(struct intel_guc *guc,
>  	 */
>  	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
>  		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
> -		guc_ct_buffer_desc_init(ctch->ctbs[i].desc,
> +		guc_ct_buffer_desc_init(&ctch->ctbs[i],
>  					base + PAGE_SIZE/4 * i + PAGE_SIZE/2,
>  					PAGE_SIZE/4,
>  					ctch->owner);
> @@ -301,32 +307,16 @@ static int ctb_write(struct intel_guc_ct_buffer  
> *ctb,
>  		     bool want_response)
>  {
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head / 4;	/* in dwords */
> -	u32 tail = desc->tail / 4;	/* in dwords */
> -	u32 size = desc->size / 4;	/* in dwords */
> -	u32 used;			/* in dwords */
> +	u32 tail = ctb->tail / 4;	/* in dwords */
> +	u32 size = ctb->size / 4;	/* in dwords */
>  	u32 header;
>  	u32 *cmds = ctb->cmds;
>  	unsigned int i;
> -	GEM_BUG_ON(desc->size % 4);
> -	GEM_BUG_ON(desc->head % 4);
> -	GEM_BUG_ON(desc->tail % 4);
> +	GEM_BUG_ON(ctb->size % 4);
> +	GEM_BUG_ON(ctb->tail % 4);
>  	GEM_BUG_ON(tail >= size);
> -	/*
> -	 * tail == head condition indicates empty. GuC FW does not support
> -	 * using up the entire buffer to get tail == head meaning full.
> -	 */
> -	if (tail < head)
> -		used = (size - head) + tail;
> -	else
> -		used = tail - head;
> -
> -	/* make sure there is a space including extra dw for the fence */
> -	if (unlikely(used + len + 1 >= size))
> -		return -ENOSPC;
> -
>  	/*
>  	 * Write the message. The format is the following:
>  	 * DW0: header (including action code)
> @@ -354,15 +344,16 @@ static int ctb_write(struct intel_guc_ct_buffer  
> *ctb,
>  	}
> 	/* now update desc tail (back in bytes) */
> -	desc->tail = tail * 4;
> -	GEM_BUG_ON(desc->tail > desc->size);
> +	ctb->tail = desc->tail = tail * 4;
> +	ctb->space -= (len + 1) * 4;
> +	GEM_BUG_ON(ctb->tail > ctb->size);
> 	return 0;
>  }
> /**
>   * wait_for_ctb_desc_update - Wait for the CT buffer descriptor update.
> - * @desc:	buffer descriptor
> + * @ctb:	ctb buffer
>   * @fence:	response fence
>   * @status:	placeholder for status
>   *
> @@ -376,11 +367,12 @@ static int ctb_write(struct intel_guc_ct_buffer  
> *ctb,
>   * *	-ETIMEDOUT no response within hardcoded timeout
>   * *	-EPROTO no response, CT buffer is in error
>   */
> -static int wait_for_ctb_desc_update(struct guc_ct_buffer_desc *desc,
> +static int wait_for_ctb_desc_update(struct intel_guc_ct_buffer *ctb,
>  				    u32 fence,
>  				    u32 *status)
>  {
>  	int err;
> +	struct guc_ct_buffer_desc *desc = ctb->desc;
> 	/*
>  	 * Fast commands should complete in less than 10us, so sample quickly
> @@ -401,7 +393,7 @@ static int wait_for_ctb_desc_update(struct  
> guc_ct_buffer_desc *desc,
>  			/* Something went wrong with the messaging, try to reset
>  			 * the buffer and hope for the best
>  			 */
> -			guc_ct_buffer_desc_reset(desc);
> +			guc_ct_buffer_desc_reset(ctb);
>  			err = -EPROTO;
>  		}
>  	}
> @@ -446,12 +438,17 @@ static int wait_for_ct_request_update(struct  
> ct_request *req, u32 *status)
>  	return err;
>  }
> -static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, u32  
> len)
> +static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32  
> len)
>  {
> -	u32 head = READ_ONCE(desc->head);
> +	u32 head;
>  	u32 space;
> -	space = CIRC_SPACE(desc->tail, head, desc->size);
> +	if (ctb->space >= len)
> +		return true;
> +
> +	head = READ_ONCE(ctb->desc->head);
> +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
> +	ctb->space = space;
> 	return space >= len;
>  }
> @@ -462,7 +459,6 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
>  {
>  	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>  	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
> -	struct guc_ct_buffer_desc *desc = ctb->desc;
>  	int err;
> 	GEM_BUG_ON(!ctch->enabled);
> @@ -470,7 +466,7 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
>  	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>  	lockdep_assert_held(&ct->send_lock);
> -	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
> +	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
>  		ct->retry++;
>  		if (ct->retry >= MAX_RETRY)
>  			return -EDEADLK;
> @@ -496,7 +492,6 @@ static int ctch_send(struct intel_guc_ct *ct,
>  		     u32 *status)
>  {
>  	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
> -	struct guc_ct_buffer_desc *desc = ctb->desc;
>  	struct ct_request request;
>  	unsigned long flags;
>  	u32 fence;
> @@ -514,7 +509,7 @@ static int ctch_send(struct intel_guc_ct *ct,
>  	 */
>  retry:
>  	spin_lock_irqsave(&ct->send_lock, flags);
> -	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
> +	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
>  		spin_unlock_irqrestore(&ct->send_lock, flags);
>  		ct->retry++;
>  		if (ct->retry >= MAX_RETRY)
> @@ -544,7 +539,7 @@ static int ctch_send(struct intel_guc_ct *ct,
>  	if (response_buf)
>  		err = wait_for_ct_request_update(&request, status);
>  	else
> -		err = wait_for_ctb_desc_update(desc, fence, status);
> +		err = wait_for_ctb_desc_update(ctb, fence, status);
>  	if (unlikely(err))
>  		goto unlink;
> @@ -618,9 +613,9 @@ static inline bool ct_header_is_response(u32 header)
>  static int ctb_read(struct intel_guc_ct_buffer *ctb, u32 *data)
>  {
>  	struct guc_ct_buffer_desc *desc = ctb->desc;
> -	u32 head = desc->head / 4;	/* in dwords */
> +	u32 head = ctb->head / 4;	/* in dwords */
>  	u32 tail = desc->tail / 4;	/* in dwords */
> -	u32 size = desc->size / 4;	/* in dwords */
> +	u32 size = ctb->size / 4;	/* in dwords */
>  	u32 *cmds = ctb->cmds;
>  	s32 available;			/* in dwords */
>  	unsigned int len;
> @@ -664,7 +659,7 @@ static int ctb_read(struct intel_guc_ct_buffer *ctb,  
> u32 *data)
>  	}
>  	CT_DEBUG_DRIVER("CT: received %*ph\n", 4 * len, data);
> -	desc->head = head * 4;
> +	ctb->head = desc->head = head * 4;
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h  
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index bc670a796bd8..1bff4f0b91f7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -29,10 +29,18 @@ struct intel_guc;
>   *
>   * @desc: pointer to the buffer descriptor
>   * @cmds: pointer to the commands buffer
> + * @size: local shadow copy of size

I would rather expect this as real fixed size,
note that size is not expected to change

> + * @head: local shadow copy of head
> + * @tail: local shadow copy of tail
> + * @space: local shadow copy of space
>   */
>  struct intel_guc_ct_buffer {
>  	struct guc_ct_buffer_desc *desc;
>  	u32 *cmds;
> +	u32 size;
> +	u32 tail;
> +	u32 head;
> +	u32 space;
>  };
> /** Represents pair of command transport buffers.

Can we reorder this patch to be first in the series ?

Michal
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/3] drm/i915/guc: Increase size of CTB buffers
@ 2019-11-21 12:25     ` Michal Wajdeczko
  0 siblings, 0 replies; 36+ messages in thread
From: Michal Wajdeczko @ 2019-11-21 12:25 UTC (permalink / raw)
  To: Intel-GFX, John.C.Harrison

On Thu, 21 Nov 2019 00:56:04 +0100, <John.C.Harrison@intel.com> wrote:

> From: Matthew Brost <matthew.brost@intel.com>
>
> With the introduction of non-blocking CTBs more than one CTB can be in
> flight at a time. Increasing the size of the CTBs should reduce how
> often software hits the case where no space is available in the CTB
> buffer.
>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 50 +++++++++++++++--------
>  1 file changed, 32 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c  
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 4d8a4c6afd71..31c512e7ecc2 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -14,6 +14,10 @@
>  #define CT_DEBUG_DRIVER(...)	do { } while (0)
>  #endif
> +#define CTB_DESC_SIZE		(PAGE_SIZE / 2)

CTB descriptors is now 64B each, not sure why we want to waste
whole page for them. Maybe to better use space (and be ready for
upcoming changes) we can place the buffer right after descriptor:

  *       <------ DESCRIPTOR ------> <--- BUFFER ------------->
  *
  *      +--------------------------+--------------------------+
  *      | addr | head | tail | ... |                          |
  *      +--------------------------+--------------------------+
  *          \                      ^
  *           \____________________/
  *
  *       <------------ 64B -------> <---- n * PAGE - 64B ---->

> +#define CTB_H2G_BUFFER_SIZE	(PAGE_SIZE)
> +#define CTB_G2H_BUFFER_SIZE	(CTB_H2G_BUFFER_SIZE * 2)
> +
>  #define MAX_RETRY		0x1000000
> struct ct_request {
> @@ -143,30 +147,35 @@ static int ctch_init(struct intel_guc *guc,
> 	GEM_BUG_ON(ctch->vma);
> -	/* We allocate 1 page to hold both descriptors and both buffers.
> +	/* We allocate 3 pages to hold both descriptors and both buffers.
>  	 *       ___________.....................
>  	 *      |desc (SEND)|                   :
> -	 *      |___________|                   PAGE/4
> +	 *      |___________|                   PAGE/2
>  	 *      :___________....................:
>  	 *      |desc (RECV)|                   :
> -	 *      |___________|                   PAGE/4
> +	 *      |___________|                   PAGE/2
>  	 *      :_______________________________:
>  	 *      |cmds (SEND)                    |
> -	 *      |                               PAGE/4
> +	 *      |                               PAGE
>  	 *      |_______________________________|
>  	 *      |cmds (RECV)                    |
> -	 *      |                               PAGE/4
> +	 *      |                               PAGE * 2
>  	 *      |_______________________________|
>  	 *
>  	 * Each message can use a maximum of 32 dwords and we don't expect to
> -	 * have more than 1 in flight at any time, so we have enough space.
> -	 * Some logic further ahead will rely on the fact that there is only 1
> -	 * page and that it is always mapped, so if the size is changed the
> -	 * other code will need updating as well.
> +	 * have more than 1 in flight at any time, unless we are using the GuC
> +	 * submission. In that case each request requires a minimum 8 bytes
> +	 * which gives us a maximum 512 queue'd requests. Hopefully this enough

hmm, do we really expect to have 512 messages in flight ?

> +	 * space to avoid backpressure on the driver. We also double the size  
> of
> +	 * the receive buffer (relative to the send) to ensure a g2h response
> +	 * CTB has a landing spot.

We do plan to send nob-blocking messages that might generate higher  
traffic, but
do we expect matching increase in incoming traffic ? what kind of data  
will be
there ? and do we expect that driver will be unable consume them in timely  
manner?

>  	 */
> 	/* allocate vma */
> -	vma = intel_guc_allocate_vma(guc, PAGE_SIZE);
> +	vma = intel_guc_allocate_vma(guc, CTB_DESC_SIZE *
> +				     ARRAY_SIZE(ctch->ctbs) +
> +				     CTB_H2G_BUFFER_SIZE +
> +				     CTB_G2H_BUFFER_SIZE);
>  	if (IS_ERR(vma)) {
>  		err = PTR_ERR(vma);
>  		goto err_out;
> @@ -185,8 +194,9 @@ static int ctch_init(struct intel_guc *guc,
>  	/* store pointers to desc and cmds */
>  	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
>  		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
> -		ctch->ctbs[i].desc = blob + PAGE_SIZE/4 * i;
> -		ctch->ctbs[i].cmds = blob + PAGE_SIZE/4 * i + PAGE_SIZE/2;
> +		ctch->ctbs[i].desc = blob + CTB_DESC_SIZE * i;
> +		ctch->ctbs[i].cmds = blob + CTB_H2G_BUFFER_SIZE * i +
> +			CTB_DESC_SIZE * ARRAY_SIZE(ctch->ctbs);
>  	}
> 	return 0;
> @@ -210,7 +220,7 @@ static void ctch_fini(struct intel_guc *guc,
>  static int ctch_enable(struct intel_guc *guc,
>  		       struct intel_guc_ct_channel *ctch)
>  {
> -	u32 base;
> +	u32 base, size;
>  	int err;
>  	int i;
> @@ -226,9 +236,12 @@ static int ctch_enable(struct intel_guc *guc,
>  	 */
>  	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
>  		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
> +		size = (i == CTB_SEND) ? CTB_H2G_BUFFER_SIZE :
> +			CTB_G2H_BUFFER_SIZE;
>  		guc_ct_buffer_desc_init(&ctch->ctbs[i],
> -					base + PAGE_SIZE/4 * i + PAGE_SIZE/2,
> -					PAGE_SIZE/4,
> +					base + CTB_H2G_BUFFER_SIZE * i +
> +					CTB_DESC_SIZE * ARRAY_SIZE(ctch->ctbs),
> +					size,
>  					ctch->owner);
>  	}
> @@ -236,13 +249,13 @@ static int ctch_enable(struct intel_guc *guc,
>  	 * descriptors are in first half of the blob
>  	 */
>  	err = guc_action_register_ct_buffer(guc,
> -					    base + PAGE_SIZE/4 * CTB_RECV,
> +					    base + CTB_DESC_SIZE * CTB_RECV,
>  					    INTEL_GUC_CT_BUFFER_TYPE_RECV);
>  	if (unlikely(err))
>  		goto err_out;
> 	err = guc_action_register_ct_buffer(guc,
> -					    base + PAGE_SIZE/4 * CTB_SEND,
> +					    base + CTB_DESC_SIZE * CTB_SEND,
>  					    INTEL_GUC_CT_BUFFER_TYPE_SEND);
>  	if (unlikely(err))
>  		goto err_deregister;
> @@ -635,7 +648,8 @@ static int ctb_read(struct intel_guc_ct_buffer *ctb,  
> u32 *data)
>  	/* beware of buffer wrap case */
>  	if (unlikely(available < 0))
>  		available += size;
> -	CT_DEBUG_DRIVER("CT: available %d (%u:%u)\n", available, head, tail);
> +	CT_DEBUG_DRIVER("CT: available %d (%u:%u:%d)\n", available, head, tail,
> +			size);

size will not change, not sure if it is worth to repeat that in every log

>  	GEM_BUG_ON(available < 0);
> 	data[0] = cmds[head];
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Intel-gfx] [PATCH 3/3] drm/i915/guc: Increase size of CTB buffers
@ 2019-11-21 12:25     ` Michal Wajdeczko
  0 siblings, 0 replies; 36+ messages in thread
From: Michal Wajdeczko @ 2019-11-21 12:25 UTC (permalink / raw)
  To: Intel-GFX, John.C.Harrison

On Thu, 21 Nov 2019 00:56:04 +0100, <John.C.Harrison@intel.com> wrote:

> From: Matthew Brost <matthew.brost@intel.com>
>
> With the introduction of non-blocking CTBs more than one CTB can be in
> flight at a time. Increasing the size of the CTBs should reduce how
> often software hits the case where no space is available in the CTB
> buffer.
>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 50 +++++++++++++++--------
>  1 file changed, 32 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c  
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 4d8a4c6afd71..31c512e7ecc2 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -14,6 +14,10 @@
>  #define CT_DEBUG_DRIVER(...)	do { } while (0)
>  #endif
> +#define CTB_DESC_SIZE		(PAGE_SIZE / 2)

CTB descriptors is now 64B each, not sure why we want to waste
whole page for them. Maybe to better use space (and be ready for
upcoming changes) we can place the buffer right after descriptor:

  *       <------ DESCRIPTOR ------> <--- BUFFER ------------->
  *
  *      +--------------------------+--------------------------+
  *      | addr | head | tail | ... |                          |
  *      +--------------------------+--------------------------+
  *          \                      ^
  *           \____________________/
  *
  *       <------------ 64B -------> <---- n * PAGE - 64B ---->

> +#define CTB_H2G_BUFFER_SIZE	(PAGE_SIZE)
> +#define CTB_G2H_BUFFER_SIZE	(CTB_H2G_BUFFER_SIZE * 2)
> +
>  #define MAX_RETRY		0x1000000
> struct ct_request {
> @@ -143,30 +147,35 @@ static int ctch_init(struct intel_guc *guc,
> 	GEM_BUG_ON(ctch->vma);
> -	/* We allocate 1 page to hold both descriptors and both buffers.
> +	/* We allocate 3 pages to hold both descriptors and both buffers.
>  	 *       ___________.....................
>  	 *      |desc (SEND)|                   :
> -	 *      |___________|                   PAGE/4
> +	 *      |___________|                   PAGE/2
>  	 *      :___________....................:
>  	 *      |desc (RECV)|                   :
> -	 *      |___________|                   PAGE/4
> +	 *      |___________|                   PAGE/2
>  	 *      :_______________________________:
>  	 *      |cmds (SEND)                    |
> -	 *      |                               PAGE/4
> +	 *      |                               PAGE
>  	 *      |_______________________________|
>  	 *      |cmds (RECV)                    |
> -	 *      |                               PAGE/4
> +	 *      |                               PAGE * 2
>  	 *      |_______________________________|
>  	 *
>  	 * Each message can use a maximum of 32 dwords and we don't expect to
> -	 * have more than 1 in flight at any time, so we have enough space.
> -	 * Some logic further ahead will rely on the fact that there is only 1
> -	 * page and that it is always mapped, so if the size is changed the
> -	 * other code will need updating as well.
> +	 * have more than 1 in flight at any time, unless we are using the GuC
> +	 * submission. In that case each request requires a minimum 8 bytes
> +	 * which gives us a maximum 512 queue'd requests. Hopefully this enough

hmm, do we really expect to have 512 messages in flight ?

> +	 * space to avoid backpressure on the driver. We also double the size  
> of
> +	 * the receive buffer (relative to the send) to ensure a g2h response
> +	 * CTB has a landing spot.

We do plan to send nob-blocking messages that might generate higher  
traffic, but
do we expect matching increase in incoming traffic ? what kind of data  
will be
there ? and do we expect that driver will be unable consume them in timely  
manner?

>  	 */
> 	/* allocate vma */
> -	vma = intel_guc_allocate_vma(guc, PAGE_SIZE);
> +	vma = intel_guc_allocate_vma(guc, CTB_DESC_SIZE *
> +				     ARRAY_SIZE(ctch->ctbs) +
> +				     CTB_H2G_BUFFER_SIZE +
> +				     CTB_G2H_BUFFER_SIZE);
>  	if (IS_ERR(vma)) {
>  		err = PTR_ERR(vma);
>  		goto err_out;
> @@ -185,8 +194,9 @@ static int ctch_init(struct intel_guc *guc,
>  	/* store pointers to desc and cmds */
>  	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
>  		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
> -		ctch->ctbs[i].desc = blob + PAGE_SIZE/4 * i;
> -		ctch->ctbs[i].cmds = blob + PAGE_SIZE/4 * i + PAGE_SIZE/2;
> +		ctch->ctbs[i].desc = blob + CTB_DESC_SIZE * i;
> +		ctch->ctbs[i].cmds = blob + CTB_H2G_BUFFER_SIZE * i +
> +			CTB_DESC_SIZE * ARRAY_SIZE(ctch->ctbs);
>  	}
> 	return 0;
> @@ -210,7 +220,7 @@ static void ctch_fini(struct intel_guc *guc,
>  static int ctch_enable(struct intel_guc *guc,
>  		       struct intel_guc_ct_channel *ctch)
>  {
> -	u32 base;
> +	u32 base, size;
>  	int err;
>  	int i;
> @@ -226,9 +236,12 @@ static int ctch_enable(struct intel_guc *guc,
>  	 */
>  	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
>  		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
> +		size = (i == CTB_SEND) ? CTB_H2G_BUFFER_SIZE :
> +			CTB_G2H_BUFFER_SIZE;
>  		guc_ct_buffer_desc_init(&ctch->ctbs[i],
> -					base + PAGE_SIZE/4 * i + PAGE_SIZE/2,
> -					PAGE_SIZE/4,
> +					base + CTB_H2G_BUFFER_SIZE * i +
> +					CTB_DESC_SIZE * ARRAY_SIZE(ctch->ctbs),
> +					size,
>  					ctch->owner);
>  	}
> @@ -236,13 +249,13 @@ static int ctch_enable(struct intel_guc *guc,
>  	 * descriptors are in first half of the blob
>  	 */
>  	err = guc_action_register_ct_buffer(guc,
> -					    base + PAGE_SIZE/4 * CTB_RECV,
> +					    base + CTB_DESC_SIZE * CTB_RECV,
>  					    INTEL_GUC_CT_BUFFER_TYPE_RECV);
>  	if (unlikely(err))
>  		goto err_out;
> 	err = guc_action_register_ct_buffer(guc,
> -					    base + PAGE_SIZE/4 * CTB_SEND,
> +					    base + CTB_DESC_SIZE * CTB_SEND,
>  					    INTEL_GUC_CT_BUFFER_TYPE_SEND);
>  	if (unlikely(err))
>  		goto err_deregister;
> @@ -635,7 +648,8 @@ static int ctb_read(struct intel_guc_ct_buffer *ctb,  
> u32 *data)
>  	/* beware of buffer wrap case */
>  	if (unlikely(available < 0))
>  		available += size;
> -	CT_DEBUG_DRIVER("CT: available %d (%u:%u)\n", available, head, tail);
> +	CT_DEBUG_DRIVER("CT: available %d (%u:%u:%d)\n", available, head, tail,
> +			size);

size will not change, not sure if it is worth to repeat that in every log

>  	GEM_BUG_ON(available < 0);
> 	data[0] = cmds[head];
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] drm/i915/guc: Optimized CTB writes and reads
@ 2019-11-21 15:56       ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2019-11-21 15:56 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: Intel-GFX

On Thu, Nov 21, 2019 at 12:58:50PM +0100, Michal Wajdeczko wrote:
>On Thu, 21 Nov 2019 00:56:03 +0100, <John.C.Harrison@intel.com> wrote:
>
>>From: Matthew Brost <matthew.brost@intel.com>
>>
>>CTB writes are now in the path of command submission and should be
>>optimized for performance. Rather than reading CTB descriptor values
>>(e.g. head, tail, size) which could result in accesses across the PCIe
>>bus, store shadow local copies and only read/write the descriptor
>>values when absolutely necessary.
>>
>>Cc: John Harrison <john.c.harrison@intel.com>
>>Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>---
>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 79 +++++++++++------------
>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  8 +++
>> 2 files changed, 45 insertions(+), 42 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>index e50d968b15d5..4d8a4c6afd71 100644
>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>@@ -68,23 +68,29 @@ static inline const char 
>>*guc_ct_buffer_type_to_str(u32 type)
>> 	}
>> }
>>-static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
>>+static void guc_ct_buffer_desc_init(struct intel_guc_ct_buffer *ctb,
>> 				    u32 cmds_addr, u32 size, u32 owner)
>
>as now this function takes ctb instead of desc, it should be renamed
>or make it separate from guc_ct_buffer_desc_init
>

Yes, makes sense.

>> {
>>+	struct guc_ct_buffer_desc *desc = ctb->desc;
>> 	CT_DEBUG_DRIVER("CT: desc %p init addr=%#x size=%u owner=%u\n",
>> 			desc, cmds_addr, size, owner);
>> 	memset(desc, 0, sizeof(*desc));
>> 	desc->addr = cmds_addr;
>>-	desc->size = size;
>>+	ctb->size = desc->size = size;
>> 	desc->owner = owner;
>>+	ctb->tail = 0;
>>+	ctb->head = 0;
>>+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>> }
>>-static void guc_ct_buffer_desc_reset(struct guc_ct_buffer_desc *desc)
>>+static void guc_ct_buffer_desc_reset(struct intel_guc_ct_buffer *ctb)
>
>same here
>

Same.

>> {
>>+	struct guc_ct_buffer_desc *desc = ctb->desc;
>> 	CT_DEBUG_DRIVER("CT: desc %p reset head=%u tail=%u\n",
>> 			desc, desc->head, desc->tail);
>>-	desc->head = 0;
>>-	desc->tail = 0;
>>+	ctb->head = desc->head = 0;
>>+	ctb->tail = desc->tail = 0;
>>+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>> 	desc->is_in_error = 0;
>> }
>>@@ -220,7 +226,7 @@ static int ctch_enable(struct intel_guc *guc,
>> 	 */
>> 	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
>> 		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
>>-		guc_ct_buffer_desc_init(ctch->ctbs[i].desc,
>>+		guc_ct_buffer_desc_init(&ctch->ctbs[i],
>> 					base + PAGE_SIZE/4 * i + PAGE_SIZE/2,
>> 					PAGE_SIZE/4,
>> 					ctch->owner);
>>@@ -301,32 +307,16 @@ static int ctb_write(struct 
>>intel_guc_ct_buffer *ctb,
>> 		     bool want_response)
>> {
>> 	struct guc_ct_buffer_desc *desc = ctb->desc;
>>-	u32 head = desc->head / 4;	/* in dwords */
>>-	u32 tail = desc->tail / 4;	/* in dwords */
>>-	u32 size = desc->size / 4;	/* in dwords */
>>-	u32 used;			/* in dwords */
>>+	u32 tail = ctb->tail / 4;	/* in dwords */
>>+	u32 size = ctb->size / 4;	/* in dwords */
>> 	u32 header;
>> 	u32 *cmds = ctb->cmds;
>> 	unsigned int i;
>>-	GEM_BUG_ON(desc->size % 4);
>>-	GEM_BUG_ON(desc->head % 4);
>>-	GEM_BUG_ON(desc->tail % 4);
>>+	GEM_BUG_ON(ctb->size % 4);
>>+	GEM_BUG_ON(ctb->tail % 4);
>> 	GEM_BUG_ON(tail >= size);
>>-	/*
>>-	 * tail == head condition indicates empty. GuC FW does not support
>>-	 * using up the entire buffer to get tail == head meaning full.
>>-	 */
>>-	if (tail < head)
>>-		used = (size - head) + tail;
>>-	else
>>-		used = tail - head;
>>-
>>-	/* make sure there is a space including extra dw for the fence */
>>-	if (unlikely(used + len + 1 >= size))
>>-		return -ENOSPC;
>>-
>> 	/*
>> 	 * Write the message. The format is the following:
>> 	 * DW0: header (including action code)
>>@@ -354,15 +344,16 @@ static int ctb_write(struct 
>>intel_guc_ct_buffer *ctb,
>> 	}
>>	/* now update desc tail (back in bytes) */
>>-	desc->tail = tail * 4;
>>-	GEM_BUG_ON(desc->tail > desc->size);
>>+	ctb->tail = desc->tail = tail * 4;
>>+	ctb->space -= (len + 1) * 4;
>>+	GEM_BUG_ON(ctb->tail > ctb->size);
>>	return 0;
>> }
>>/**
>>  * wait_for_ctb_desc_update - Wait for the CT buffer descriptor update.
>>- * @desc:	buffer descriptor
>>+ * @ctb:	ctb buffer
>>  * @fence:	response fence
>>  * @status:	placeholder for status
>>  *
>>@@ -376,11 +367,12 @@ static int ctb_write(struct 
>>intel_guc_ct_buffer *ctb,
>>  * *	-ETIMEDOUT no response within hardcoded timeout
>>  * *	-EPROTO no response, CT buffer is in error
>>  */
>>-static int wait_for_ctb_desc_update(struct guc_ct_buffer_desc *desc,
>>+static int wait_for_ctb_desc_update(struct intel_guc_ct_buffer *ctb,
>> 				    u32 fence,
>> 				    u32 *status)
>> {
>> 	int err;
>>+	struct guc_ct_buffer_desc *desc = ctb->desc;
>>	/*
>> 	 * Fast commands should complete in less than 10us, so sample quickly
>>@@ -401,7 +393,7 @@ static int wait_for_ctb_desc_update(struct 
>>guc_ct_buffer_desc *desc,
>> 			/* Something went wrong with the messaging, try to reset
>> 			 * the buffer and hope for the best
>> 			 */
>>-			guc_ct_buffer_desc_reset(desc);
>>+			guc_ct_buffer_desc_reset(ctb);
>> 			err = -EPROTO;
>> 		}
>> 	}
>>@@ -446,12 +438,17 @@ static int wait_for_ct_request_update(struct 
>>ct_request *req, u32 *status)
>> 	return err;
>> }
>>-static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, 
>>u32 len)
>>+static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, 
>>u32 len)
>> {
>>-	u32 head = READ_ONCE(desc->head);
>>+	u32 head;
>> 	u32 space;
>>-	space = CIRC_SPACE(desc->tail, head, desc->size);
>>+	if (ctb->space >= len)
>>+		return true;
>>+
>>+	head = READ_ONCE(ctb->desc->head);
>>+	space = CIRC_SPACE(ctb->tail, head, ctb->size);
>>+	ctb->space = space;
>>	return space >= len;
>> }
>>@@ -462,7 +459,6 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
>> {
>> 	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>> 	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>-	struct guc_ct_buffer_desc *desc = ctb->desc;
>> 	int err;
>>	GEM_BUG_ON(!ctch->enabled);
>>@@ -470,7 +466,7 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
>> 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>> 	lockdep_assert_held(&ct->send_lock);
>>-	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>+	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
>> 		ct->retry++;
>> 		if (ct->retry >= MAX_RETRY)
>> 			return -EDEADLK;
>>@@ -496,7 +492,6 @@ static int ctch_send(struct intel_guc_ct *ct,
>> 		     u32 *status)
>> {
>> 	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>-	struct guc_ct_buffer_desc *desc = ctb->desc;
>> 	struct ct_request request;
>> 	unsigned long flags;
>> 	u32 fence;
>>@@ -514,7 +509,7 @@ static int ctch_send(struct intel_guc_ct *ct,
>> 	 */
>> retry:
>> 	spin_lock_irqsave(&ct->send_lock, flags);
>>-	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>+	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
>> 		spin_unlock_irqrestore(&ct->send_lock, flags);
>> 		ct->retry++;
>> 		if (ct->retry >= MAX_RETRY)
>>@@ -544,7 +539,7 @@ static int ctch_send(struct intel_guc_ct *ct,
>> 	if (response_buf)
>> 		err = wait_for_ct_request_update(&request, status);
>> 	else
>>-		err = wait_for_ctb_desc_update(desc, fence, status);
>>+		err = wait_for_ctb_desc_update(ctb, fence, status);
>> 	if (unlikely(err))
>> 		goto unlink;
>>@@ -618,9 +613,9 @@ static inline bool ct_header_is_response(u32 header)
>> static int ctb_read(struct intel_guc_ct_buffer *ctb, u32 *data)
>> {
>> 	struct guc_ct_buffer_desc *desc = ctb->desc;
>>-	u32 head = desc->head / 4;	/* in dwords */
>>+	u32 head = ctb->head / 4;	/* in dwords */
>> 	u32 tail = desc->tail / 4;	/* in dwords */
>>-	u32 size = desc->size / 4;	/* in dwords */
>>+	u32 size = ctb->size / 4;	/* in dwords */
>> 	u32 *cmds = ctb->cmds;
>> 	s32 available;			/* in dwords */
>> 	unsigned int len;
>>@@ -664,7 +659,7 @@ static int ctb_read(struct intel_guc_ct_buffer 
>>*ctb, u32 *data)
>> 	}
>> 	CT_DEBUG_DRIVER("CT: received %*ph\n", 4 * len, data);
>>-	desc->head = head * 4;
>>+	ctb->head = desc->head = head * 4;
>> 	return 0;
>> }
>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h 
>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>index bc670a796bd8..1bff4f0b91f7 100644
>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>@@ -29,10 +29,18 @@ struct intel_guc;
>>  *
>>  * @desc: pointer to the buffer descriptor
>>  * @cmds: pointer to the commands buffer
>>+ * @size: local shadow copy of size
>
>I would rather expect this as real fixed size,
>note that size is not expected to change
>

Yes, it is fixed over the life of the CTB but not all CTBs need to be the same
size. e.g. The H2G & G2H may and likely will be different sizes with the new Guc
interface.

>>+ * @head: local shadow copy of head
>>+ * @tail: local shadow copy of tail
>>+ * @space: local shadow copy of space
>>  */
>> struct intel_guc_ct_buffer {
>> 	struct guc_ct_buffer_desc *desc;
>> 	u32 *cmds;
>>+	u32 size;
>>+	u32 tail;
>>+	u32 head;
>>+	u32 space;
>> };
>>/** Represents pair of command transport buffers.
>
>Can we reorder this patch to be first in the series ?
>
>Michal
>_______________________________________________

Yes.

>Intel-gfx mailing list
>Intel-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Intel-gfx] [PATCH 2/3] drm/i915/guc: Optimized CTB writes and reads
@ 2019-11-21 15:56       ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2019-11-21 15:56 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: Intel-GFX

On Thu, Nov 21, 2019 at 12:58:50PM +0100, Michal Wajdeczko wrote:
>On Thu, 21 Nov 2019 00:56:03 +0100, <John.C.Harrison@intel.com> wrote:
>
>>From: Matthew Brost <matthew.brost@intel.com>
>>
>>CTB writes are now in the path of command submission and should be
>>optimized for performance. Rather than reading CTB descriptor values
>>(e.g. head, tail, size) which could result in accesses across the PCIe
>>bus, store shadow local copies and only read/write the descriptor
>>values when absolutely necessary.
>>
>>Cc: John Harrison <john.c.harrison@intel.com>
>>Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>---
>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 79 +++++++++++------------
>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  8 +++
>> 2 files changed, 45 insertions(+), 42 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>index e50d968b15d5..4d8a4c6afd71 100644
>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>@@ -68,23 +68,29 @@ static inline const char 
>>*guc_ct_buffer_type_to_str(u32 type)
>> 	}
>> }
>>-static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
>>+static void guc_ct_buffer_desc_init(struct intel_guc_ct_buffer *ctb,
>> 				    u32 cmds_addr, u32 size, u32 owner)
>
>as now this function takes ctb instead of desc, it should be renamed
>or make it separate from guc_ct_buffer_desc_init
>

Yes, makes sense.

>> {
>>+	struct guc_ct_buffer_desc *desc = ctb->desc;
>> 	CT_DEBUG_DRIVER("CT: desc %p init addr=%#x size=%u owner=%u\n",
>> 			desc, cmds_addr, size, owner);
>> 	memset(desc, 0, sizeof(*desc));
>> 	desc->addr = cmds_addr;
>>-	desc->size = size;
>>+	ctb->size = desc->size = size;
>> 	desc->owner = owner;
>>+	ctb->tail = 0;
>>+	ctb->head = 0;
>>+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>> }
>>-static void guc_ct_buffer_desc_reset(struct guc_ct_buffer_desc *desc)
>>+static void guc_ct_buffer_desc_reset(struct intel_guc_ct_buffer *ctb)
>
>same here
>

Same.

>> {
>>+	struct guc_ct_buffer_desc *desc = ctb->desc;
>> 	CT_DEBUG_DRIVER("CT: desc %p reset head=%u tail=%u\n",
>> 			desc, desc->head, desc->tail);
>>-	desc->head = 0;
>>-	desc->tail = 0;
>>+	ctb->head = desc->head = 0;
>>+	ctb->tail = desc->tail = 0;
>>+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>> 	desc->is_in_error = 0;
>> }
>>@@ -220,7 +226,7 @@ static int ctch_enable(struct intel_guc *guc,
>> 	 */
>> 	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
>> 		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
>>-		guc_ct_buffer_desc_init(ctch->ctbs[i].desc,
>>+		guc_ct_buffer_desc_init(&ctch->ctbs[i],
>> 					base + PAGE_SIZE/4 * i + PAGE_SIZE/2,
>> 					PAGE_SIZE/4,
>> 					ctch->owner);
>>@@ -301,32 +307,16 @@ static int ctb_write(struct 
>>intel_guc_ct_buffer *ctb,
>> 		     bool want_response)
>> {
>> 	struct guc_ct_buffer_desc *desc = ctb->desc;
>>-	u32 head = desc->head / 4;	/* in dwords */
>>-	u32 tail = desc->tail / 4;	/* in dwords */
>>-	u32 size = desc->size / 4;	/* in dwords */
>>-	u32 used;			/* in dwords */
>>+	u32 tail = ctb->tail / 4;	/* in dwords */
>>+	u32 size = ctb->size / 4;	/* in dwords */
>> 	u32 header;
>> 	u32 *cmds = ctb->cmds;
>> 	unsigned int i;
>>-	GEM_BUG_ON(desc->size % 4);
>>-	GEM_BUG_ON(desc->head % 4);
>>-	GEM_BUG_ON(desc->tail % 4);
>>+	GEM_BUG_ON(ctb->size % 4);
>>+	GEM_BUG_ON(ctb->tail % 4);
>> 	GEM_BUG_ON(tail >= size);
>>-	/*
>>-	 * tail == head condition indicates empty. GuC FW does not support
>>-	 * using up the entire buffer to get tail == head meaning full.
>>-	 */
>>-	if (tail < head)
>>-		used = (size - head) + tail;
>>-	else
>>-		used = tail - head;
>>-
>>-	/* make sure there is a space including extra dw for the fence */
>>-	if (unlikely(used + len + 1 >= size))
>>-		return -ENOSPC;
>>-
>> 	/*
>> 	 * Write the message. The format is the following:
>> 	 * DW0: header (including action code)
>>@@ -354,15 +344,16 @@ static int ctb_write(struct 
>>intel_guc_ct_buffer *ctb,
>> 	}
>>	/* now update desc tail (back in bytes) */
>>-	desc->tail = tail * 4;
>>-	GEM_BUG_ON(desc->tail > desc->size);
>>+	ctb->tail = desc->tail = tail * 4;
>>+	ctb->space -= (len + 1) * 4;
>>+	GEM_BUG_ON(ctb->tail > ctb->size);
>>	return 0;
>> }
>>/**
>>  * wait_for_ctb_desc_update - Wait for the CT buffer descriptor update.
>>- * @desc:	buffer descriptor
>>+ * @ctb:	ctb buffer
>>  * @fence:	response fence
>>  * @status:	placeholder for status
>>  *
>>@@ -376,11 +367,12 @@ static int ctb_write(struct 
>>intel_guc_ct_buffer *ctb,
>>  * *	-ETIMEDOUT no response within hardcoded timeout
>>  * *	-EPROTO no response, CT buffer is in error
>>  */
>>-static int wait_for_ctb_desc_update(struct guc_ct_buffer_desc *desc,
>>+static int wait_for_ctb_desc_update(struct intel_guc_ct_buffer *ctb,
>> 				    u32 fence,
>> 				    u32 *status)
>> {
>> 	int err;
>>+	struct guc_ct_buffer_desc *desc = ctb->desc;
>>	/*
>> 	 * Fast commands should complete in less than 10us, so sample quickly
>>@@ -401,7 +393,7 @@ static int wait_for_ctb_desc_update(struct 
>>guc_ct_buffer_desc *desc,
>> 			/* Something went wrong with the messaging, try to reset
>> 			 * the buffer and hope for the best
>> 			 */
>>-			guc_ct_buffer_desc_reset(desc);
>>+			guc_ct_buffer_desc_reset(ctb);
>> 			err = -EPROTO;
>> 		}
>> 	}
>>@@ -446,12 +438,17 @@ static int wait_for_ct_request_update(struct 
>>ct_request *req, u32 *status)
>> 	return err;
>> }
>>-static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, 
>>u32 len)
>>+static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, 
>>u32 len)
>> {
>>-	u32 head = READ_ONCE(desc->head);
>>+	u32 head;
>> 	u32 space;
>>-	space = CIRC_SPACE(desc->tail, head, desc->size);
>>+	if (ctb->space >= len)
>>+		return true;
>>+
>>+	head = READ_ONCE(ctb->desc->head);
>>+	space = CIRC_SPACE(ctb->tail, head, ctb->size);
>>+	ctb->space = space;
>>	return space >= len;
>> }
>>@@ -462,7 +459,6 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
>> {
>> 	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>> 	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>-	struct guc_ct_buffer_desc *desc = ctb->desc;
>> 	int err;
>>	GEM_BUG_ON(!ctch->enabled);
>>@@ -470,7 +466,7 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
>> 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>> 	lockdep_assert_held(&ct->send_lock);
>>-	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>+	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
>> 		ct->retry++;
>> 		if (ct->retry >= MAX_RETRY)
>> 			return -EDEADLK;
>>@@ -496,7 +492,6 @@ static int ctch_send(struct intel_guc_ct *ct,
>> 		     u32 *status)
>> {
>> 	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>-	struct guc_ct_buffer_desc *desc = ctb->desc;
>> 	struct ct_request request;
>> 	unsigned long flags;
>> 	u32 fence;
>>@@ -514,7 +509,7 @@ static int ctch_send(struct intel_guc_ct *ct,
>> 	 */
>> retry:
>> 	spin_lock_irqsave(&ct->send_lock, flags);
>>-	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>+	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
>> 		spin_unlock_irqrestore(&ct->send_lock, flags);
>> 		ct->retry++;
>> 		if (ct->retry >= MAX_RETRY)
>>@@ -544,7 +539,7 @@ static int ctch_send(struct intel_guc_ct *ct,
>> 	if (response_buf)
>> 		err = wait_for_ct_request_update(&request, status);
>> 	else
>>-		err = wait_for_ctb_desc_update(desc, fence, status);
>>+		err = wait_for_ctb_desc_update(ctb, fence, status);
>> 	if (unlikely(err))
>> 		goto unlink;
>>@@ -618,9 +613,9 @@ static inline bool ct_header_is_response(u32 header)
>> static int ctb_read(struct intel_guc_ct_buffer *ctb, u32 *data)
>> {
>> 	struct guc_ct_buffer_desc *desc = ctb->desc;
>>-	u32 head = desc->head / 4;	/* in dwords */
>>+	u32 head = ctb->head / 4;	/* in dwords */
>> 	u32 tail = desc->tail / 4;	/* in dwords */
>>-	u32 size = desc->size / 4;	/* in dwords */
>>+	u32 size = ctb->size / 4;	/* in dwords */
>> 	u32 *cmds = ctb->cmds;
>> 	s32 available;			/* in dwords */
>> 	unsigned int len;
>>@@ -664,7 +659,7 @@ static int ctb_read(struct intel_guc_ct_buffer 
>>*ctb, u32 *data)
>> 	}
>> 	CT_DEBUG_DRIVER("CT: received %*ph\n", 4 * len, data);
>>-	desc->head = head * 4;
>>+	ctb->head = desc->head = head * 4;
>> 	return 0;
>> }
>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h 
>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>index bc670a796bd8..1bff4f0b91f7 100644
>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>@@ -29,10 +29,18 @@ struct intel_guc;
>>  *
>>  * @desc: pointer to the buffer descriptor
>>  * @cmds: pointer to the commands buffer
>>+ * @size: local shadow copy of size
>
>I would rather expect this as real fixed size,
>note that size is not expected to change
>

Yes, it is fixed over the life of the CTB but not all CTBs need to be the same
size. e.g. The H2G & G2H may and likely will be different sizes with the new Guc
interface.

>>+ * @head: local shadow copy of head
>>+ * @tail: local shadow copy of tail
>>+ * @space: local shadow copy of space
>>  */
>> struct intel_guc_ct_buffer {
>> 	struct guc_ct_buffer_desc *desc;
>> 	u32 *cmds;
>>+	u32 size;
>>+	u32 tail;
>>+	u32 head;
>>+	u32 space;
>> };
>>/** Represents pair of command transport buffers.
>
>Can we reorder this patch to be first in the series ?
>
>Michal
>_______________________________________________

Yes.

>Intel-gfx mailing list
>Intel-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/3] drm/i915/guc: Increase size of CTB buffers
@ 2019-11-21 16:11       ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2019-11-21 16:11 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: Intel-GFX

On Thu, Nov 21, 2019 at 01:25:05PM +0100, Michal Wajdeczko wrote:
>On Thu, 21 Nov 2019 00:56:04 +0100, <John.C.Harrison@intel.com> wrote:
>
>>From: Matthew Brost <matthew.brost@intel.com>
>>
>>With the introduction of non-blocking CTBs more than one CTB can be in
>>flight at a time. Increasing the size of the CTBs should reduce how
>>often software hits the case where no space is available in the CTB
>>buffer.
>>
>>Cc: John Harrison <john.c.harrison@intel.com>
>>Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>---
>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 50 +++++++++++++++--------
>> 1 file changed, 32 insertions(+), 18 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>index 4d8a4c6afd71..31c512e7ecc2 100644
>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>@@ -14,6 +14,10 @@
>> #define CT_DEBUG_DRIVER(...)	do { } while (0)
>> #endif
>>+#define CTB_DESC_SIZE		(PAGE_SIZE / 2)
>
>CTB descriptors is now 64B each, not sure why we want to waste
>whole page for them. Maybe to better use space (and be ready for
>upcoming changes) we can place the buffer right after descriptor:
>
> *       <------ DESCRIPTOR ------> <--- BUFFER ------------->
> *
> *      +--------------------------+--------------------------+
> *      | addr | head | tail | ... |                          |
> *      +--------------------------+--------------------------+
> *          \                      ^
> *           \____________________/
> *
> *       <------------ 64B -------> <---- n * PAGE - 64B ---->
>

I agree that is wasteful but if the linux/circ_buf.h wants to be used to
determine the space in the buffer, the buffer size has to be a power of 2. I
suspect the GuC determines space in a similar way and requires the size to be a
power of 2. I can reach out and find out if this the case.

>>+#define CTB_H2G_BUFFER_SIZE	(PAGE_SIZE)
>>+#define CTB_G2H_BUFFER_SIZE	(CTB_H2G_BUFFER_SIZE * 2)
>>+
>> #define MAX_RETRY		0x1000000
>>struct ct_request {
>>@@ -143,30 +147,35 @@ static int ctch_init(struct intel_guc *guc,
>>	GEM_BUG_ON(ctch->vma);
>>-	/* We allocate 1 page to hold both descriptors and both buffers.
>>+	/* We allocate 3 pages to hold both descriptors and both buffers.
>> 	 *       ___________.....................
>> 	 *      |desc (SEND)|                   :
>>-	 *      |___________|                   PAGE/4
>>+	 *      |___________|                   PAGE/2
>> 	 *      :___________....................:
>> 	 *      |desc (RECV)|                   :
>>-	 *      |___________|                   PAGE/4
>>+	 *      |___________|                   PAGE/2
>> 	 *      :_______________________________:
>> 	 *      |cmds (SEND)                    |
>>-	 *      |                               PAGE/4
>>+	 *      |                               PAGE
>> 	 *      |_______________________________|
>> 	 *      |cmds (RECV)                    |
>>-	 *      |                               PAGE/4
>>+	 *      |                               PAGE * 2
>> 	 *      |_______________________________|
>> 	 *
>> 	 * Each message can use a maximum of 32 dwords and we don't expect to
>>-	 * have more than 1 in flight at any time, so we have enough space.
>>-	 * Some logic further ahead will rely on the fact that there is only 1
>>-	 * page and that it is always mapped, so if the size is changed the
>>-	 * other code will need updating as well.
>>+	 * have more than 1 in flight at any time, unless we are using the GuC
>>+	 * submission. In that case each request requires a minimum 8 bytes
>>+	 * which gives us a maximum 512 queue'd requests. Hopefully this enough
>
>hmm, do we really expect to have 512 messages in flight ?
>

Potentially, yes. In fact we may expect more than that. Part of the reason for
this patch is to add the defines CTB_H2G_BUFFER_SIZE, CTB_G2H_BUFFER_SIZE so
that the CTB sizes can easily be changed when profiling the code. We are going
to want to size these buffers in way that flow control (no space in the buffer)
is rarely hit.

BTW - This comment is wrong. The header CTB header is 8 bytes and I think the
minimum payload is 8 bytes actually this should be 256 in flight.

>>+	 * space to avoid backpressure on the driver. We also double the 
>>size of
>>+	 * the receive buffer (relative to the send) to ensure a g2h response
>>+	 * CTB has a landing spot.
>
>We do plan to send nob-blocking messages that might generate higher 
>traffic, but
>do we expect matching increase in incoming traffic ? what kind of data 
>will be
>there ? and do we expect that driver will be unable consume them in 
>timely manner?
>

Yes, as part of the new interface there are asynchronous G2H received as
response to a H2G. In addition there several spontaneous G2H generated by the
GuC.

>> 	 */
>>	/* allocate vma */
>>-	vma = intel_guc_allocate_vma(guc, PAGE_SIZE);
>>+	vma = intel_guc_allocate_vma(guc, CTB_DESC_SIZE *
>>+				     ARRAY_SIZE(ctch->ctbs) +
>>+				     CTB_H2G_BUFFER_SIZE +
>>+				     CTB_G2H_BUFFER_SIZE);
>> 	if (IS_ERR(vma)) {
>> 		err = PTR_ERR(vma);
>> 		goto err_out;
>>@@ -185,8 +194,9 @@ static int ctch_init(struct intel_guc *guc,
>> 	/* store pointers to desc and cmds */
>> 	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
>> 		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
>>-		ctch->ctbs[i].desc = blob + PAGE_SIZE/4 * i;
>>-		ctch->ctbs[i].cmds = blob + PAGE_SIZE/4 * i + PAGE_SIZE/2;
>>+		ctch->ctbs[i].desc = blob + CTB_DESC_SIZE * i;
>>+		ctch->ctbs[i].cmds = blob + CTB_H2G_BUFFER_SIZE * i +
>>+			CTB_DESC_SIZE * ARRAY_SIZE(ctch->ctbs);
>> 	}
>>	return 0;
>>@@ -210,7 +220,7 @@ static void ctch_fini(struct intel_guc *guc,
>> static int ctch_enable(struct intel_guc *guc,
>> 		       struct intel_guc_ct_channel *ctch)
>> {
>>-	u32 base;
>>+	u32 base, size;
>> 	int err;
>> 	int i;
>>@@ -226,9 +236,12 @@ static int ctch_enable(struct intel_guc *guc,
>> 	 */
>> 	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
>> 		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
>>+		size = (i == CTB_SEND) ? CTB_H2G_BUFFER_SIZE :
>>+			CTB_G2H_BUFFER_SIZE;
>> 		guc_ct_buffer_desc_init(&ctch->ctbs[i],
>>-					base + PAGE_SIZE/4 * i + PAGE_SIZE/2,
>>-					PAGE_SIZE/4,
>>+					base + CTB_H2G_BUFFER_SIZE * i +
>>+					CTB_DESC_SIZE * ARRAY_SIZE(ctch->ctbs),
>>+					size,
>> 					ctch->owner);
>> 	}
>>@@ -236,13 +249,13 @@ static int ctch_enable(struct intel_guc *guc,
>> 	 * descriptors are in first half of the blob
>> 	 */
>> 	err = guc_action_register_ct_buffer(guc,
>>-					    base + PAGE_SIZE/4 * CTB_RECV,
>>+					    base + CTB_DESC_SIZE * CTB_RECV,
>> 					    INTEL_GUC_CT_BUFFER_TYPE_RECV);
>> 	if (unlikely(err))
>> 		goto err_out;
>>	err = guc_action_register_ct_buffer(guc,
>>-					    base + PAGE_SIZE/4 * CTB_SEND,
>>+					    base + CTB_DESC_SIZE * CTB_SEND,
>> 					    INTEL_GUC_CT_BUFFER_TYPE_SEND);
>> 	if (unlikely(err))
>> 		goto err_deregister;
>>@@ -635,7 +648,8 @@ static int ctb_read(struct intel_guc_ct_buffer 
>>*ctb, u32 *data)
>> 	/* beware of buffer wrap case */
>> 	if (unlikely(available < 0))
>> 		available += size;
>>-	CT_DEBUG_DRIVER("CT: available %d (%u:%u)\n", available, head, tail);
>>+	CT_DEBUG_DRIVER("CT: available %d (%u:%u:%d)\n", available, head, tail,
>>+			size);
>
>size will not change, not sure if it is worth to repeat that in every log
>

This is a debug mechanism not turned on in normal operation. I find more
information helpful when debugging problems.

>> 	GEM_BUG_ON(available < 0);
>>	data[0] = cmds[head];
>_______________________________________________
>Intel-gfx mailing list
>Intel-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Intel-gfx] [PATCH 3/3] drm/i915/guc: Increase size of CTB buffers
@ 2019-11-21 16:11       ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2019-11-21 16:11 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: Intel-GFX

On Thu, Nov 21, 2019 at 01:25:05PM +0100, Michal Wajdeczko wrote:
>On Thu, 21 Nov 2019 00:56:04 +0100, <John.C.Harrison@intel.com> wrote:
>
>>From: Matthew Brost <matthew.brost@intel.com>
>>
>>With the introduction of non-blocking CTBs more than one CTB can be in
>>flight at a time. Increasing the size of the CTBs should reduce how
>>often software hits the case where no space is available in the CTB
>>buffer.
>>
>>Cc: John Harrison <john.c.harrison@intel.com>
>>Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>---
>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 50 +++++++++++++++--------
>> 1 file changed, 32 insertions(+), 18 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>index 4d8a4c6afd71..31c512e7ecc2 100644
>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>@@ -14,6 +14,10 @@
>> #define CT_DEBUG_DRIVER(...)	do { } while (0)
>> #endif
>>+#define CTB_DESC_SIZE		(PAGE_SIZE / 2)
>
>CTB descriptors is now 64B each, not sure why we want to waste
>whole page for them. Maybe to better use space (and be ready for
>upcoming changes) we can place the buffer right after descriptor:
>
> *       <------ DESCRIPTOR ------> <--- BUFFER ------------->
> *
> *      +--------------------------+--------------------------+
> *      | addr | head | tail | ... |                          |
> *      +--------------------------+--------------------------+
> *          \                      ^
> *           \____________________/
> *
> *       <------------ 64B -------> <---- n * PAGE - 64B ---->
>

I agree that is wasteful but if the linux/circ_buf.h wants to be used to
determine the space in the buffer, the buffer size has to be a power of 2. I
suspect the GuC determines space in a similar way and requires the size to be a
power of 2. I can reach out and find out if this the case.

>>+#define CTB_H2G_BUFFER_SIZE	(PAGE_SIZE)
>>+#define CTB_G2H_BUFFER_SIZE	(CTB_H2G_BUFFER_SIZE * 2)
>>+
>> #define MAX_RETRY		0x1000000
>>struct ct_request {
>>@@ -143,30 +147,35 @@ static int ctch_init(struct intel_guc *guc,
>>	GEM_BUG_ON(ctch->vma);
>>-	/* We allocate 1 page to hold both descriptors and both buffers.
>>+	/* We allocate 3 pages to hold both descriptors and both buffers.
>> 	 *       ___________.....................
>> 	 *      |desc (SEND)|                   :
>>-	 *      |___________|                   PAGE/4
>>+	 *      |___________|                   PAGE/2
>> 	 *      :___________....................:
>> 	 *      |desc (RECV)|                   :
>>-	 *      |___________|                   PAGE/4
>>+	 *      |___________|                   PAGE/2
>> 	 *      :_______________________________:
>> 	 *      |cmds (SEND)                    |
>>-	 *      |                               PAGE/4
>>+	 *      |                               PAGE
>> 	 *      |_______________________________|
>> 	 *      |cmds (RECV)                    |
>>-	 *      |                               PAGE/4
>>+	 *      |                               PAGE * 2
>> 	 *      |_______________________________|
>> 	 *
>> 	 * Each message can use a maximum of 32 dwords and we don't expect to
>>-	 * have more than 1 in flight at any time, so we have enough space.
>>-	 * Some logic further ahead will rely on the fact that there is only 1
>>-	 * page and that it is always mapped, so if the size is changed the
>>-	 * other code will need updating as well.
>>+	 * have more than 1 in flight at any time, unless we are using the GuC
>>+	 * submission. In that case each request requires a minimum 8 bytes
>>+	 * which gives us a maximum 512 queue'd requests. Hopefully this enough
>
>hmm, do we really expect to have 512 messages in flight ?
>

Potentially, yes. In fact we may expect more than that. Part of the reason for
this patch is to add the defines CTB_H2G_BUFFER_SIZE, CTB_G2H_BUFFER_SIZE so
that the CTB sizes can easily be changed when profiling the code. We are going
to want to size these buffers in way that flow control (no space in the buffer)
is rarely hit.

BTW - This comment is wrong. The header CTB header is 8 bytes and I think the
minimum payload is 8 bytes actually this should be 256 in flight.

>>+	 * space to avoid backpressure on the driver. We also double the 
>>size of
>>+	 * the receive buffer (relative to the send) to ensure a g2h response
>>+	 * CTB has a landing spot.
>
>We do plan to send nob-blocking messages that might generate higher 
>traffic, but
>do we expect matching increase in incoming traffic ? what kind of data 
>will be
>there ? and do we expect that driver will be unable consume them in 
>timely manner?
>

Yes, as part of the new interface there are asynchronous G2H received as
response to a H2G. In addition there several spontaneous G2H generated by the
GuC.

>> 	 */
>>	/* allocate vma */
>>-	vma = intel_guc_allocate_vma(guc, PAGE_SIZE);
>>+	vma = intel_guc_allocate_vma(guc, CTB_DESC_SIZE *
>>+				     ARRAY_SIZE(ctch->ctbs) +
>>+				     CTB_H2G_BUFFER_SIZE +
>>+				     CTB_G2H_BUFFER_SIZE);
>> 	if (IS_ERR(vma)) {
>> 		err = PTR_ERR(vma);
>> 		goto err_out;
>>@@ -185,8 +194,9 @@ static int ctch_init(struct intel_guc *guc,
>> 	/* store pointers to desc and cmds */
>> 	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
>> 		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
>>-		ctch->ctbs[i].desc = blob + PAGE_SIZE/4 * i;
>>-		ctch->ctbs[i].cmds = blob + PAGE_SIZE/4 * i + PAGE_SIZE/2;
>>+		ctch->ctbs[i].desc = blob + CTB_DESC_SIZE * i;
>>+		ctch->ctbs[i].cmds = blob + CTB_H2G_BUFFER_SIZE * i +
>>+			CTB_DESC_SIZE * ARRAY_SIZE(ctch->ctbs);
>> 	}
>>	return 0;
>>@@ -210,7 +220,7 @@ static void ctch_fini(struct intel_guc *guc,
>> static int ctch_enable(struct intel_guc *guc,
>> 		       struct intel_guc_ct_channel *ctch)
>> {
>>-	u32 base;
>>+	u32 base, size;
>> 	int err;
>> 	int i;
>>@@ -226,9 +236,12 @@ static int ctch_enable(struct intel_guc *guc,
>> 	 */
>> 	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
>> 		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
>>+		size = (i == CTB_SEND) ? CTB_H2G_BUFFER_SIZE :
>>+			CTB_G2H_BUFFER_SIZE;
>> 		guc_ct_buffer_desc_init(&ctch->ctbs[i],
>>-					base + PAGE_SIZE/4 * i + PAGE_SIZE/2,
>>-					PAGE_SIZE/4,
>>+					base + CTB_H2G_BUFFER_SIZE * i +
>>+					CTB_DESC_SIZE * ARRAY_SIZE(ctch->ctbs),
>>+					size,
>> 					ctch->owner);
>> 	}
>>@@ -236,13 +249,13 @@ static int ctch_enable(struct intel_guc *guc,
>> 	 * descriptors are in first half of the blob
>> 	 */
>> 	err = guc_action_register_ct_buffer(guc,
>>-					    base + PAGE_SIZE/4 * CTB_RECV,
>>+					    base + CTB_DESC_SIZE * CTB_RECV,
>> 					    INTEL_GUC_CT_BUFFER_TYPE_RECV);
>> 	if (unlikely(err))
>> 		goto err_out;
>>	err = guc_action_register_ct_buffer(guc,
>>-					    base + PAGE_SIZE/4 * CTB_SEND,
>>+					    base + CTB_DESC_SIZE * CTB_SEND,
>> 					    INTEL_GUC_CT_BUFFER_TYPE_SEND);
>> 	if (unlikely(err))
>> 		goto err_deregister;
>>@@ -635,7 +648,8 @@ static int ctb_read(struct intel_guc_ct_buffer 
>>*ctb, u32 *data)
>> 	/* beware of buffer wrap case */
>> 	if (unlikely(available < 0))
>> 		available += size;
>>-	CT_DEBUG_DRIVER("CT: available %d (%u:%u)\n", available, head, tail);
>>+	CT_DEBUG_DRIVER("CT: available %d (%u:%u:%d)\n", available, head, tail,
>>+			size);
>
>size will not change, not sure if it is worth to repeat that in every log
>

This is a debug mechanism not turned on in normal operation. I find more
information helpful when debugging problems.

>> 	GEM_BUG_ON(available < 0);
>>	data[0] = cmds[head];
>_______________________________________________
>Intel-gfx mailing list
>Intel-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] drm/i915/guc: Add non blocking CTB send function
@ 2019-11-22  0:13       ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2019-11-22  0:13 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: Intel-GFX

On Thu, Nov 21, 2019 at 12:43:26PM +0100, Michal Wajdeczko wrote:
>On Thu, 21 Nov 2019 00:56:02 +0100, <John.C.Harrison@intel.com> wrote:
>
>>From: Matthew Brost <matthew.brost@intel.com>
>>
>>Add non blocking CTB send fuction, intel_guc_send_nb. In order to
>>support a non blocking CTB send fuction a spin lock is needed to
>
>2x typos
>
>>protect the CTB descriptors fields. Also the non blocking call must not
>>update the fence value as this value is owned by the blocking call
>>(intel_guc_send).
>
>you probably mean "intel_guc_send_ct", as intel_guc_send is just a wrapper
>around guc->send
>

Ah, yes.

>>
>>The blocking CTB now must have a flow control mechanism to ensure the
>>buffer isn't overrun. A lazy spin wait is used as we believe the flow
>>control condition should be rare with properly sized buffer. A retry
>>counter is also implemented which fails H2G CTBs once a limit is
>>reached to prevent deadlock.
>>
>>The function, intel_guc_send_nb, is exported in this patch but unused.
>>Several patches later in the series make use of this function.
>
>It's likely in yet another series
>

Yes, it is.

>>
>>Cc: John Harrison <john.c.harrison@intel.com>
>>Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>---
>> drivers/gpu/drm/i915/gt/uc/intel_guc.h    |  2 +
>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 97 +++++++++++++++++++----
>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 10 ++-
>> 3 files changed, 91 insertions(+), 18 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
>>b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>index e6400204a2bd..77c5af919ace 100644
>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>@@ -94,6 +94,8 @@ intel_guc_send_and_receive(struct intel_guc *guc, 
>>const u32 *action, u32 len,
>> 	return guc->send(guc, action, len, response_buf, response_buf_size);
>> }
>>+int intel_guc_send_nb(struct intel_guc_ct *ct, const u32 *action, 
>>u32 len);
>>+
>
>Hmm, this mismatch of guc/ct parameter breaks the our layering.
>But we can keep this layering intact by introducing some flags to
>the existing guc_send() function. These flags could be passed as
>high bits in action[0], like this:

This seems reasonable.

>
>#define GUC_ACTION_FLAG_DONT_WAIT 0x80000000
>
>int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset)
>{
>	u32 action[] = {
>		INTEL_GUC_ACTION_AUTHENTICATE_HUC | GUC_ACTION_FLAG_DONT_WAIT,
>		rsa_offset
>	};
>
>	return intel_guc_send(guc, action, ARRAY_SIZE(action));
>}
>
>then actual back-end of guc->send can take proper steps based on this flag:
>
>@@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action, 
>u32 len,
>        GEM_BUG_ON(!len);
>        GEM_BUG_ON(len > guc->send_regs.count);
>
>+       if (*action & GUC_ACTION_FLAG_DONT_WAIT)
>+               return -EINVAL;
>+       *action &= ~GUC_ACTION_FLAG_DONT_WAIT;
>+
>        /* We expect only action code */
>        GEM_BUG_ON(*action & ~INTEL_GUC_MSG_CODE_MASK);
>
>@@ @@ int intel_guc_send_ct(struct intel_guc *guc, const u32 *action, 
>u32 len,
>        u32 status = ~0; /* undefined */
>        int ret;
>
>+       if (*action & GUC_ACTION_FLAG_DONT_WAIT) {
>+               GEM_BUG_ON(response_buf);
>+               GEM_BUG_ON(response_buf_size);
>+               return ctch_send_nb(ct, ctch, action, len);
>+       }
>+
>        mutex_lock(&guc->send_mutex);
>
>        ret = ctch_send(ct, ctch, action, len, response_buf, 
>response_buf_size,
>
>
>> static inline void intel_guc_notify(struct intel_guc *guc)
>> {
>> 	guc->notify(guc);
>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>index b49115517510..e50d968b15d5 100644
>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>@@ -3,6 +3,8 @@
>>  * Copyright © 2016-2019 Intel Corporation
>>  */
>>+#include <linux/circ_buf.h>
>>+
>> #include "i915_drv.h"
>> #include "intel_guc_ct.h"
>>@@ -12,6 +14,8 @@
>> #define CT_DEBUG_DRIVER(...)	do { } while (0)
>> #endif
>>+#define MAX_RETRY		0x1000000
>>+
>> struct ct_request {
>> 	struct list_head link;
>> 	u32 fence;
>>@@ -40,7 +44,8 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>> 	/* we're using static channel owners */
>> 	ct->host_channel.owner = CTB_OWNER_HOST;
>>-	spin_lock_init(&ct->lock);
>>+	spin_lock_init(&ct->request_lock);
>>+	spin_lock_init(&ct->send_lock);
>> 	INIT_LIST_HEAD(&ct->pending_requests);
>> 	INIT_LIST_HEAD(&ct->incoming_requests);
>> 	INIT_WORK(&ct->worker, ct_incoming_request_worker_func);
>>@@ -291,7 +296,8 @@ static u32 ctch_get_next_fence(struct 
>>intel_guc_ct_channel *ctch)
>> static int ctb_write(struct intel_guc_ct_buffer *ctb,
>> 		     const u32 *action,
>> 		     u32 len /* in dwords */,
>>-		     u32 fence,
>>+		     u32 fence_value,
>>+		     bool enable_fence,
>
>maybe we can just guarantee that fence=0 will never be used as a valid
>fence id, then this flag could be replaced with (fence != 0) check.
>

Yes, again seems reasonable. Initialize next_fence = 1, then increment by 2 each
time and this works.

>> 		     bool want_response)
>> {
>> 	struct guc_ct_buffer_desc *desc = ctb->desc;
>>@@ -328,18 +334,18 @@ static int ctb_write(struct 
>>intel_guc_ct_buffer *ctb,
>> 	 * DW2+: action data
>> 	 */
>> 	header = (len << GUC_CT_MSG_LEN_SHIFT) |
>>-		 (GUC_CT_MSG_WRITE_FENCE_TO_DESC) |
>>+		 (enable_fence ? GUC_CT_MSG_WRITE_FENCE_TO_DESC : 0) |
>
>Hmm, even if we ask fw to do not write back fence to the descriptor,
>IIRC current firmware will unconditionally write back return status
>of this non-blocking call, possibly overwriting status of the blocked
>call.
>

Yes, known problem with the interface that needs to be fixed.

>> 		 (want_response ? GUC_CT_MSG_SEND_STATUS : 0) |
>
>btw, if we switch all requests to expect reply send back over CTB,
>then we can possibly drop the send_mutex in CTB paths, and block
>only when there is no DONT_WAIT flag and we have to wait for response.
>

Rather just wait for the GuC to fix this.

>> 		 (action[0] << GUC_CT_MSG_ACTION_SHIFT);
>>	CT_DEBUG_DRIVER("CT: writing %*ph %*ph %*ph\n",
>>-			4, &header, 4, &fence,
>>+			4, &header, 4, &fence_value,
>> 			4 * (len - 1), &action[1]);
>>	cmds[tail] = header;
>> 	tail = (tail + 1) % size;
>>-	cmds[tail] = fence;
>>+	cmds[tail] = fence_value;
>> 	tail = (tail + 1) % size;
>>	for (i = 1; i < len; i++) {
>>@@ -440,6 +446,47 @@ static int wait_for_ct_request_update(struct 
>>ct_request *req, u32 *status)
>> 	return err;
>> }
>>+static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, 
>>u32 len)
>>+{
>>+	u32 head = READ_ONCE(desc->head);
>>+	u32 space;
>>+
>>+	space = CIRC_SPACE(desc->tail, head, desc->size);
>>+
>>+	return space >= len;
>>+}
>>+
>>+int intel_guc_send_nb(struct intel_guc_ct *ct,
>>+		      const u32 *action,
>>+		      u32 len)
>>+{
>>+	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>>+	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>+	struct guc_ct_buffer_desc *desc = ctb->desc;
>>+	int err;
>>+
>>+	GEM_BUG_ON(!ctch->enabled);
>>+	GEM_BUG_ON(!len);
>>+	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>+	lockdep_assert_held(&ct->send_lock);
>
>hmm, does it mean that now it's caller responsibility to spinlock
>on CT private lock ? That is not how other guc_send() functions work.
>

Yes, that how I would like this work as I feel like it gives more flexability to
the caller on the -EBUSY case. The caller can call intel_guc_send_nb again while
still holding the lock or it release lock the and use a different form of flow
control. Perhaps locking / unlocking should be exposed via static inlines rather
than the caller directly manipulating the lock?

>>+
>>+	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>+		ct->retry++;
>>+		if (ct->retry >= MAX_RETRY)
>>+			return -EDEADLK;
>>+		else
>>+			return -EBUSY;
>>+	}
>>+
>>+	ct->retry = 0;
>>+	err = ctb_write(ctb, action, len, 0, false, false);
>>+	if (unlikely(err))
>>+		return err;
>>+
>>+	intel_guc_notify(ct_to_guc(ct));
>>+	return 0;
>>+}
>>+
>> static int ctch_send(struct intel_guc_ct *ct,
>> 		     struct intel_guc_ct_channel *ctch,
>> 		     const u32 *action,
>>@@ -460,17 +507,35 @@ static int ctch_send(struct intel_guc_ct *ct,
>> 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>> 	GEM_BUG_ON(!response_buf && response_buf_size);
>>+	/*
>>+	 * We use a lazy spin wait loop here as we believe that if the CT
>>+	 * buffers are sized correctly the flow control condition should be
>>+	 * rare.
>>+	 */
>>+retry:
>>+	spin_lock_irqsave(&ct->send_lock, flags);
>>+	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>+		spin_unlock_irqrestore(&ct->send_lock, flags);
>>+		ct->retry++;
>>+		if (ct->retry >= MAX_RETRY)
>>+			return -EDEADLK;
>
>I'm not sure what's better: have secret deadlock hard to reproduce,
>or deadlocks easier to catch that helps improve to be deadlock-clean
>

This is covering the case where the has died and to avoid deadlock. Eventually
we will have some GuC health check code that will trigger a full GPU reset if
the GuC has died. We need a way for code spinning on the CTBs to exit.

I've already tweaked this code locally a bit to use an atomic too with the idea
being the GuC health check code can set this value to have all code spinning on
CTBs immediately return --EDEADLK when the GuC has died.

>>+		cpu_relax();
>>+		goto retry;
>>+	}
>>+
>>+	ct->retry = 0;
>> 	fence = ctch_get_next_fence(ctch);
>> 	request.fence = fence;
>> 	request.status = 0;
>> 	request.response_len = response_buf_size;
>> 	request.response_buf = response_buf;
>>-	spin_lock_irqsave(&ct->lock, flags);
>>+	spin_lock(&ct->request_lock);
>> 	list_add_tail(&request.link, &ct->pending_requests);
>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>+	spin_unlock(&ct->request_lock);
>>-	err = ctb_write(ctb, action, len, fence, !!response_buf);
>>+	err = ctb_write(ctb, action, len, fence, true, !!response_buf);
>>+	spin_unlock_irqrestore(&ct->send_lock, flags);
>> 	if (unlikely(err))
>> 		goto unlink;
>>@@ -501,9 +566,9 @@ static int ctch_send(struct intel_guc_ct *ct,
>> 	}
>>unlink:
>>-	spin_lock_irqsave(&ct->lock, flags);
>>+	spin_lock_irqsave(&ct->request_lock, flags);
>> 	list_del(&request.link);
>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>+	spin_unlock_irqrestore(&ct->request_lock, flags);
>>	return err;
>> }
>>@@ -653,7 +718,7 @@ static int ct_handle_response(struct 
>>intel_guc_ct *ct, const u32 *msg)
>>	CT_DEBUG_DRIVER("CT: response fence %u status %#x\n", fence, status);
>>-	spin_lock(&ct->lock);
>>+	spin_lock(&ct->request_lock);
>> 	list_for_each_entry(req, &ct->pending_requests, link) {
>> 		if (unlikely(fence != req->fence)) {
>> 			CT_DEBUG_DRIVER("CT: request %u awaits response\n",
>>@@ -672,7 +737,7 @@ static int ct_handle_response(struct 
>>intel_guc_ct *ct, const u32 *msg)
>> 		found = true;
>> 		break;
>> 	}
>>-	spin_unlock(&ct->lock);
>>+	spin_unlock(&ct->request_lock);
>>	if (!found)
>> 		DRM_ERROR("CT: unsolicited response %*ph\n", 4 * msglen, msg);
>>@@ -710,13 +775,13 @@ static bool 
>>ct_process_incoming_requests(struct intel_guc_ct *ct)
>> 	u32 *payload;
>> 	bool done;
>>-	spin_lock_irqsave(&ct->lock, flags);
>>+	spin_lock_irqsave(&ct->request_lock, flags);
>> 	request = list_first_entry_or_null(&ct->incoming_requests,
>> 					   struct ct_incoming_request, link);
>> 	if (request)
>> 		list_del(&request->link);
>> 	done = !!list_empty(&ct->incoming_requests);
>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>+	spin_unlock_irqrestore(&ct->request_lock, flags);
>>	if (!request)
>> 		return true;
>>@@ -777,9 +842,9 @@ static int ct_handle_request(struct intel_guc_ct 
>>*ct, const u32 *msg)
>> 	}
>> 	memcpy(request->msg, msg, 4 * msglen);
>>-	spin_lock_irqsave(&ct->lock, flags);
>>+	spin_lock_irqsave(&ct->request_lock, flags);
>> 	list_add_tail(&request->link, &ct->incoming_requests);
>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>+	spin_unlock_irqrestore(&ct->request_lock, flags);
>>	queue_work(system_unbound_wq, &ct->worker);
>> 	return 0;
>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h 
>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>index 7c24d83f5c24..bc670a796bd8 100644
>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>@@ -62,8 +62,11 @@ struct intel_guc_ct {
>> 	struct intel_guc_ct_channel host_channel;
>> 	/* other channels are tbd */
>>-	/** @lock: protects pending requests list */
>>-	spinlock_t lock;
>>+	/** @request_lock: protects pending requests list */
>>+	spinlock_t request_lock;
>>+
>>+	/** @send_lock: protects h2g channel */
>>+	spinlock_t send_lock;
>>	/** @pending_requests: list of requests waiting for response */
>> 	struct list_head pending_requests;
>>@@ -73,6 +76,9 @@ struct intel_guc_ct {
>>	/** @worker: worker for handling incoming requests */
>> 	struct work_struct worker;
>>+
>>+	/** @retry: the number of times a H2G CTB has been retried */
>>+	u32 retry;
>> };
>>void intel_guc_ct_init_early(struct intel_guc_ct *ct);
>_______________________________________________
>Intel-gfx mailing list
>Intel-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Add non blocking CTB send function
@ 2019-11-22  0:13       ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2019-11-22  0:13 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: Intel-GFX

On Thu, Nov 21, 2019 at 12:43:26PM +0100, Michal Wajdeczko wrote:
>On Thu, 21 Nov 2019 00:56:02 +0100, <John.C.Harrison@intel.com> wrote:
>
>>From: Matthew Brost <matthew.brost@intel.com>
>>
>>Add non blocking CTB send fuction, intel_guc_send_nb. In order to
>>support a non blocking CTB send fuction a spin lock is needed to
>
>2x typos
>
>>protect the CTB descriptors fields. Also the non blocking call must not
>>update the fence value as this value is owned by the blocking call
>>(intel_guc_send).
>
>you probably mean "intel_guc_send_ct", as intel_guc_send is just a wrapper
>around guc->send
>

Ah, yes.

>>
>>The blocking CTB now must have a flow control mechanism to ensure the
>>buffer isn't overrun. A lazy spin wait is used as we believe the flow
>>control condition should be rare with properly sized buffer. A retry
>>counter is also implemented which fails H2G CTBs once a limit is
>>reached to prevent deadlock.
>>
>>The function, intel_guc_send_nb, is exported in this patch but unused.
>>Several patches later in the series make use of this function.
>
>It's likely in yet another series
>

Yes, it is.

>>
>>Cc: John Harrison <john.c.harrison@intel.com>
>>Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>---
>> drivers/gpu/drm/i915/gt/uc/intel_guc.h    |  2 +
>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 97 +++++++++++++++++++----
>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 10 ++-
>> 3 files changed, 91 insertions(+), 18 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
>>b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>index e6400204a2bd..77c5af919ace 100644
>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>@@ -94,6 +94,8 @@ intel_guc_send_and_receive(struct intel_guc *guc, 
>>const u32 *action, u32 len,
>> 	return guc->send(guc, action, len, response_buf, response_buf_size);
>> }
>>+int intel_guc_send_nb(struct intel_guc_ct *ct, const u32 *action, 
>>u32 len);
>>+
>
>Hmm, this mismatch of guc/ct parameter breaks the our layering.
>But we can keep this layering intact by introducing some flags to
>the existing guc_send() function. These flags could be passed as
>high bits in action[0], like this:

This seems reasonable.

>
>#define GUC_ACTION_FLAG_DONT_WAIT 0x80000000
>
>int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset)
>{
>	u32 action[] = {
>		INTEL_GUC_ACTION_AUTHENTICATE_HUC | GUC_ACTION_FLAG_DONT_WAIT,
>		rsa_offset
>	};
>
>	return intel_guc_send(guc, action, ARRAY_SIZE(action));
>}
>
>then actual back-end of guc->send can take proper steps based on this flag:
>
>@@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action, 
>u32 len,
>        GEM_BUG_ON(!len);
>        GEM_BUG_ON(len > guc->send_regs.count);
>
>+       if (*action & GUC_ACTION_FLAG_DONT_WAIT)
>+               return -EINVAL;
>+       *action &= ~GUC_ACTION_FLAG_DONT_WAIT;
>+
>        /* We expect only action code */
>        GEM_BUG_ON(*action & ~INTEL_GUC_MSG_CODE_MASK);
>
>@@ @@ int intel_guc_send_ct(struct intel_guc *guc, const u32 *action, 
>u32 len,
>        u32 status = ~0; /* undefined */
>        int ret;
>
>+       if (*action & GUC_ACTION_FLAG_DONT_WAIT) {
>+               GEM_BUG_ON(response_buf);
>+               GEM_BUG_ON(response_buf_size);
>+               return ctch_send_nb(ct, ctch, action, len);
>+       }
>+
>        mutex_lock(&guc->send_mutex);
>
>        ret = ctch_send(ct, ctch, action, len, response_buf, 
>response_buf_size,
>
>
>> static inline void intel_guc_notify(struct intel_guc *guc)
>> {
>> 	guc->notify(guc);
>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>index b49115517510..e50d968b15d5 100644
>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>@@ -3,6 +3,8 @@
>>  * Copyright © 2016-2019 Intel Corporation
>>  */
>>+#include <linux/circ_buf.h>
>>+
>> #include "i915_drv.h"
>> #include "intel_guc_ct.h"
>>@@ -12,6 +14,8 @@
>> #define CT_DEBUG_DRIVER(...)	do { } while (0)
>> #endif
>>+#define MAX_RETRY		0x1000000
>>+
>> struct ct_request {
>> 	struct list_head link;
>> 	u32 fence;
>>@@ -40,7 +44,8 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>> 	/* we're using static channel owners */
>> 	ct->host_channel.owner = CTB_OWNER_HOST;
>>-	spin_lock_init(&ct->lock);
>>+	spin_lock_init(&ct->request_lock);
>>+	spin_lock_init(&ct->send_lock);
>> 	INIT_LIST_HEAD(&ct->pending_requests);
>> 	INIT_LIST_HEAD(&ct->incoming_requests);
>> 	INIT_WORK(&ct->worker, ct_incoming_request_worker_func);
>>@@ -291,7 +296,8 @@ static u32 ctch_get_next_fence(struct 
>>intel_guc_ct_channel *ctch)
>> static int ctb_write(struct intel_guc_ct_buffer *ctb,
>> 		     const u32 *action,
>> 		     u32 len /* in dwords */,
>>-		     u32 fence,
>>+		     u32 fence_value,
>>+		     bool enable_fence,
>
>maybe we can just guarantee that fence=0 will never be used as a valid
>fence id, then this flag could be replaced with (fence != 0) check.
>

Yes, again seems reasonable. Initialize next_fence = 1, then increment by 2 each
time and this works.

>> 		     bool want_response)
>> {
>> 	struct guc_ct_buffer_desc *desc = ctb->desc;
>>@@ -328,18 +334,18 @@ static int ctb_write(struct 
>>intel_guc_ct_buffer *ctb,
>> 	 * DW2+: action data
>> 	 */
>> 	header = (len << GUC_CT_MSG_LEN_SHIFT) |
>>-		 (GUC_CT_MSG_WRITE_FENCE_TO_DESC) |
>>+		 (enable_fence ? GUC_CT_MSG_WRITE_FENCE_TO_DESC : 0) |
>
>Hmm, even if we ask fw to do not write back fence to the descriptor,
>IIRC current firmware will unconditionally write back return status
>of this non-blocking call, possibly overwriting status of the blocked
>call.
>

Yes, known problem with the interface that needs to be fixed.

>> 		 (want_response ? GUC_CT_MSG_SEND_STATUS : 0) |
>
>btw, if we switch all requests to expect reply send back over CTB,
>then we can possibly drop the send_mutex in CTB paths, and block
>only when there is no DONT_WAIT flag and we have to wait for response.
>

Rather just wait for the GuC to fix this.

>> 		 (action[0] << GUC_CT_MSG_ACTION_SHIFT);
>>	CT_DEBUG_DRIVER("CT: writing %*ph %*ph %*ph\n",
>>-			4, &header, 4, &fence,
>>+			4, &header, 4, &fence_value,
>> 			4 * (len - 1), &action[1]);
>>	cmds[tail] = header;
>> 	tail = (tail + 1) % size;
>>-	cmds[tail] = fence;
>>+	cmds[tail] = fence_value;
>> 	tail = (tail + 1) % size;
>>	for (i = 1; i < len; i++) {
>>@@ -440,6 +446,47 @@ static int wait_for_ct_request_update(struct 
>>ct_request *req, u32 *status)
>> 	return err;
>> }
>>+static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, 
>>u32 len)
>>+{
>>+	u32 head = READ_ONCE(desc->head);
>>+	u32 space;
>>+
>>+	space = CIRC_SPACE(desc->tail, head, desc->size);
>>+
>>+	return space >= len;
>>+}
>>+
>>+int intel_guc_send_nb(struct intel_guc_ct *ct,
>>+		      const u32 *action,
>>+		      u32 len)
>>+{
>>+	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>>+	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>+	struct guc_ct_buffer_desc *desc = ctb->desc;
>>+	int err;
>>+
>>+	GEM_BUG_ON(!ctch->enabled);
>>+	GEM_BUG_ON(!len);
>>+	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>+	lockdep_assert_held(&ct->send_lock);
>
>hmm, does it mean that now it's caller responsibility to spinlock
>on CT private lock ? That is not how other guc_send() functions work.
>

Yes, that how I would like this work as I feel like it gives more flexability to
the caller on the -EBUSY case. The caller can call intel_guc_send_nb again while
still holding the lock or it release lock the and use a different form of flow
control. Perhaps locking / unlocking should be exposed via static inlines rather
than the caller directly manipulating the lock?

>>+
>>+	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>+		ct->retry++;
>>+		if (ct->retry >= MAX_RETRY)
>>+			return -EDEADLK;
>>+		else
>>+			return -EBUSY;
>>+	}
>>+
>>+	ct->retry = 0;
>>+	err = ctb_write(ctb, action, len, 0, false, false);
>>+	if (unlikely(err))
>>+		return err;
>>+
>>+	intel_guc_notify(ct_to_guc(ct));
>>+	return 0;
>>+}
>>+
>> static int ctch_send(struct intel_guc_ct *ct,
>> 		     struct intel_guc_ct_channel *ctch,
>> 		     const u32 *action,
>>@@ -460,17 +507,35 @@ static int ctch_send(struct intel_guc_ct *ct,
>> 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>> 	GEM_BUG_ON(!response_buf && response_buf_size);
>>+	/*
>>+	 * We use a lazy spin wait loop here as we believe that if the CT
>>+	 * buffers are sized correctly the flow control condition should be
>>+	 * rare.
>>+	 */
>>+retry:
>>+	spin_lock_irqsave(&ct->send_lock, flags);
>>+	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>+		spin_unlock_irqrestore(&ct->send_lock, flags);
>>+		ct->retry++;
>>+		if (ct->retry >= MAX_RETRY)
>>+			return -EDEADLK;
>
>I'm not sure what's better: have secret deadlock hard to reproduce,
>or deadlocks easier to catch that helps improve to be deadlock-clean
>

This is covering the case where the has died and to avoid deadlock. Eventually
we will have some GuC health check code that will trigger a full GPU reset if
the GuC has died. We need a way for code spinning on the CTBs to exit.

I've already tweaked this code locally a bit to use an atomic too with the idea
being the GuC health check code can set this value to have all code spinning on
CTBs immediately return --EDEADLK when the GuC has died.

>>+		cpu_relax();
>>+		goto retry;
>>+	}
>>+
>>+	ct->retry = 0;
>> 	fence = ctch_get_next_fence(ctch);
>> 	request.fence = fence;
>> 	request.status = 0;
>> 	request.response_len = response_buf_size;
>> 	request.response_buf = response_buf;
>>-	spin_lock_irqsave(&ct->lock, flags);
>>+	spin_lock(&ct->request_lock);
>> 	list_add_tail(&request.link, &ct->pending_requests);
>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>+	spin_unlock(&ct->request_lock);
>>-	err = ctb_write(ctb, action, len, fence, !!response_buf);
>>+	err = ctb_write(ctb, action, len, fence, true, !!response_buf);
>>+	spin_unlock_irqrestore(&ct->send_lock, flags);
>> 	if (unlikely(err))
>> 		goto unlink;
>>@@ -501,9 +566,9 @@ static int ctch_send(struct intel_guc_ct *ct,
>> 	}
>>unlink:
>>-	spin_lock_irqsave(&ct->lock, flags);
>>+	spin_lock_irqsave(&ct->request_lock, flags);
>> 	list_del(&request.link);
>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>+	spin_unlock_irqrestore(&ct->request_lock, flags);
>>	return err;
>> }
>>@@ -653,7 +718,7 @@ static int ct_handle_response(struct 
>>intel_guc_ct *ct, const u32 *msg)
>>	CT_DEBUG_DRIVER("CT: response fence %u status %#x\n", fence, status);
>>-	spin_lock(&ct->lock);
>>+	spin_lock(&ct->request_lock);
>> 	list_for_each_entry(req, &ct->pending_requests, link) {
>> 		if (unlikely(fence != req->fence)) {
>> 			CT_DEBUG_DRIVER("CT: request %u awaits response\n",
>>@@ -672,7 +737,7 @@ static int ct_handle_response(struct 
>>intel_guc_ct *ct, const u32 *msg)
>> 		found = true;
>> 		break;
>> 	}
>>-	spin_unlock(&ct->lock);
>>+	spin_unlock(&ct->request_lock);
>>	if (!found)
>> 		DRM_ERROR("CT: unsolicited response %*ph\n", 4 * msglen, msg);
>>@@ -710,13 +775,13 @@ static bool 
>>ct_process_incoming_requests(struct intel_guc_ct *ct)
>> 	u32 *payload;
>> 	bool done;
>>-	spin_lock_irqsave(&ct->lock, flags);
>>+	spin_lock_irqsave(&ct->request_lock, flags);
>> 	request = list_first_entry_or_null(&ct->incoming_requests,
>> 					   struct ct_incoming_request, link);
>> 	if (request)
>> 		list_del(&request->link);
>> 	done = !!list_empty(&ct->incoming_requests);
>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>+	spin_unlock_irqrestore(&ct->request_lock, flags);
>>	if (!request)
>> 		return true;
>>@@ -777,9 +842,9 @@ static int ct_handle_request(struct intel_guc_ct 
>>*ct, const u32 *msg)
>> 	}
>> 	memcpy(request->msg, msg, 4 * msglen);
>>-	spin_lock_irqsave(&ct->lock, flags);
>>+	spin_lock_irqsave(&ct->request_lock, flags);
>> 	list_add_tail(&request->link, &ct->incoming_requests);
>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>+	spin_unlock_irqrestore(&ct->request_lock, flags);
>>	queue_work(system_unbound_wq, &ct->worker);
>> 	return 0;
>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h 
>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>index 7c24d83f5c24..bc670a796bd8 100644
>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>@@ -62,8 +62,11 @@ struct intel_guc_ct {
>> 	struct intel_guc_ct_channel host_channel;
>> 	/* other channels are tbd */
>>-	/** @lock: protects pending requests list */
>>-	spinlock_t lock;
>>+	/** @request_lock: protects pending requests list */
>>+	spinlock_t request_lock;
>>+
>>+	/** @send_lock: protects h2g channel */
>>+	spinlock_t send_lock;
>>	/** @pending_requests: list of requests waiting for response */
>> 	struct list_head pending_requests;
>>@@ -73,6 +76,9 @@ struct intel_guc_ct {
>>	/** @worker: worker for handling incoming requests */
>> 	struct work_struct worker;
>>+
>>+	/** @retry: the number of times a H2G CTB has been retried */
>>+	u32 retry;
>> };
>>void intel_guc_ct_init_early(struct intel_guc_ct *ct);
>_______________________________________________
>Intel-gfx mailing list
>Intel-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] drm/i915/guc: Optimized CTB writes and reads
@ 2019-11-22  1:29         ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2019-11-22  1:29 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: Intel-GFX

On Thu, Nov 21, 2019 at 07:56:07AM -0800, Matthew Brost wrote:
>On Thu, Nov 21, 2019 at 12:58:50PM +0100, Michal Wajdeczko wrote:
>>On Thu, 21 Nov 2019 00:56:03 +0100, <John.C.Harrison@intel.com> wrote:
>>
>>>From: Matthew Brost <matthew.brost@intel.com>
>>>
>>>CTB writes are now in the path of command submission and should be
>>>optimized for performance. Rather than reading CTB descriptor values
>>>(e.g. head, tail, size) which could result in accesses across the PCIe
>>>bus, store shadow local copies and only read/write the descriptor
>>>values when absolutely necessary.
>>>
>>>Cc: John Harrison <john.c.harrison@intel.com>
>>>Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>---
>>>drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 79 +++++++++++------------
>>>drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  8 +++
>>>2 files changed, 45 insertions(+), 42 deletions(-)
>>>
>>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
>>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>index e50d968b15d5..4d8a4c6afd71 100644
>>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>@@ -68,23 +68,29 @@ static inline const char 
>>>*guc_ct_buffer_type_to_str(u32 type)
>>>	}
>>>}
>>>-static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
>>>+static void guc_ct_buffer_desc_init(struct intel_guc_ct_buffer *ctb,
>>>				    u32 cmds_addr, u32 size, u32 owner)
>>
>>as now this function takes ctb instead of desc, it should be renamed
>>or make it separate from guc_ct_buffer_desc_init
>>
>
>Yes, makes sense.
>
>>>{
>>>+	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>	CT_DEBUG_DRIVER("CT: desc %p init addr=%#x size=%u owner=%u\n",
>>>			desc, cmds_addr, size, owner);
>>>	memset(desc, 0, sizeof(*desc));
>>>	desc->addr = cmds_addr;
>>>-	desc->size = size;
>>>+	ctb->size = desc->size = size;
>>>	desc->owner = owner;
>>>+	ctb->tail = 0;
>>>+	ctb->head = 0;
>>>+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>>>}
>>>-static void guc_ct_buffer_desc_reset(struct guc_ct_buffer_desc *desc)
>>>+static void guc_ct_buffer_desc_reset(struct intel_guc_ct_buffer *ctb)
>>
>>same here
>>
>
>Same.
>
>>>{
>>>+	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>	CT_DEBUG_DRIVER("CT: desc %p reset head=%u tail=%u\n",
>>>			desc, desc->head, desc->tail);
>>>-	desc->head = 0;
>>>-	desc->tail = 0;
>>>+	ctb->head = desc->head = 0;
>>>+	ctb->tail = desc->tail = 0;
>>>+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>>>	desc->is_in_error = 0;
>>>}
>>>@@ -220,7 +226,7 @@ static int ctch_enable(struct intel_guc *guc,
>>>	 */
>>>	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
>>>		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
>>>-		guc_ct_buffer_desc_init(ctch->ctbs[i].desc,
>>>+		guc_ct_buffer_desc_init(&ctch->ctbs[i],
>>>					base + PAGE_SIZE/4 * i + PAGE_SIZE/2,
>>>					PAGE_SIZE/4,
>>>					ctch->owner);
>>>@@ -301,32 +307,16 @@ static int ctb_write(struct 
>>>intel_guc_ct_buffer *ctb,
>>>		     bool want_response)
>>>{
>>>	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>-	u32 head = desc->head / 4;	/* in dwords */
>>>-	u32 tail = desc->tail / 4;	/* in dwords */
>>>-	u32 size = desc->size / 4;	/* in dwords */
>>>-	u32 used;			/* in dwords */
>>>+	u32 tail = ctb->tail / 4;	/* in dwords */
>>>+	u32 size = ctb->size / 4;	/* in dwords */
>>>	u32 header;
>>>	u32 *cmds = ctb->cmds;
>>>	unsigned int i;
>>>-	GEM_BUG_ON(desc->size % 4);
>>>-	GEM_BUG_ON(desc->head % 4);
>>>-	GEM_BUG_ON(desc->tail % 4);
>>>+	GEM_BUG_ON(ctb->size % 4);
>>>+	GEM_BUG_ON(ctb->tail % 4);
>>>	GEM_BUG_ON(tail >= size);
>>>-	/*
>>>-	 * tail == head condition indicates empty. GuC FW does not support
>>>-	 * using up the entire buffer to get tail == head meaning full.
>>>-	 */
>>>-	if (tail < head)
>>>-		used = (size - head) + tail;
>>>-	else
>>>-		used = tail - head;
>>>-
>>>-	/* make sure there is a space including extra dw for the fence */
>>>-	if (unlikely(used + len + 1 >= size))
>>>-		return -ENOSPC;
>>>-
>>>	/*
>>>	 * Write the message. The format is the following:
>>>	 * DW0: header (including action code)
>>>@@ -354,15 +344,16 @@ static int ctb_write(struct 
>>>intel_guc_ct_buffer *ctb,
>>>	}
>>>	/* now update desc tail (back in bytes) */
>>>-	desc->tail = tail * 4;
>>>-	GEM_BUG_ON(desc->tail > desc->size);
>>>+	ctb->tail = desc->tail = tail * 4;
>>>+	ctb->space -= (len + 1) * 4;
>>>+	GEM_BUG_ON(ctb->tail > ctb->size);
>>>	return 0;
>>>}
>>>/**
>>> * wait_for_ctb_desc_update - Wait for the CT buffer descriptor update.
>>>- * @desc:	buffer descriptor
>>>+ * @ctb:	ctb buffer
>>> * @fence:	response fence
>>> * @status:	placeholder for status
>>> *
>>>@@ -376,11 +367,12 @@ static int ctb_write(struct 
>>>intel_guc_ct_buffer *ctb,
>>> * *	-ETIMEDOUT no response within hardcoded timeout
>>> * *	-EPROTO no response, CT buffer is in error
>>> */
>>>-static int wait_for_ctb_desc_update(struct guc_ct_buffer_desc *desc,
>>>+static int wait_for_ctb_desc_update(struct intel_guc_ct_buffer *ctb,
>>>				    u32 fence,
>>>				    u32 *status)
>>>{
>>>	int err;
>>>+	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>	/*
>>>	 * Fast commands should complete in less than 10us, so sample quickly
>>>@@ -401,7 +393,7 @@ static int wait_for_ctb_desc_update(struct 
>>>guc_ct_buffer_desc *desc,
>>>			/* Something went wrong with the messaging, try to reset
>>>			 * the buffer and hope for the best
>>>			 */
>>>-			guc_ct_buffer_desc_reset(desc);
>>>+			guc_ct_buffer_desc_reset(ctb);
>>>			err = -EPROTO;
>>>		}
>>>	}
>>>@@ -446,12 +438,17 @@ static int wait_for_ct_request_update(struct 
>>>ct_request *req, u32 *status)
>>>	return err;
>>>}
>>>-static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, 
>>>u32 len)
>>>+static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, 
>>>u32 len)
>>>{
>>>-	u32 head = READ_ONCE(desc->head);
>>>+	u32 head;
>>>	u32 space;
>>>-	space = CIRC_SPACE(desc->tail, head, desc->size);
>>>+	if (ctb->space >= len)
>>>+		return true;
>>>+
>>>+	head = READ_ONCE(ctb->desc->head);
>>>+	space = CIRC_SPACE(ctb->tail, head, ctb->size);
>>>+	ctb->space = space;
>>>	return space >= len;
>>>}
>>>@@ -462,7 +459,6 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
>>>{
>>>	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>>>	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>>-	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>	int err;
>>>	GEM_BUG_ON(!ctch->enabled);
>>>@@ -470,7 +466,7 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
>>>	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>	lockdep_assert_held(&ct->send_lock);
>>>-	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>+	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
>>>		ct->retry++;
>>>		if (ct->retry >= MAX_RETRY)
>>>			return -EDEADLK;
>>>@@ -496,7 +492,6 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>		     u32 *status)
>>>{
>>>	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>>-	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>	struct ct_request request;
>>>	unsigned long flags;
>>>	u32 fence;
>>>@@ -514,7 +509,7 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>	 */
>>>retry:
>>>	spin_lock_irqsave(&ct->send_lock, flags);
>>>-	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>+	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
>>>		spin_unlock_irqrestore(&ct->send_lock, flags);
>>>		ct->retry++;
>>>		if (ct->retry >= MAX_RETRY)
>>>@@ -544,7 +539,7 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>	if (response_buf)
>>>		err = wait_for_ct_request_update(&request, status);
>>>	else
>>>-		err = wait_for_ctb_desc_update(desc, fence, status);
>>>+		err = wait_for_ctb_desc_update(ctb, fence, status);
>>>	if (unlikely(err))
>>>		goto unlink;
>>>@@ -618,9 +613,9 @@ static inline bool ct_header_is_response(u32 header)
>>>static int ctb_read(struct intel_guc_ct_buffer *ctb, u32 *data)
>>>{
>>>	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>-	u32 head = desc->head / 4;	/* in dwords */
>>>+	u32 head = ctb->head / 4;	/* in dwords */
>>>	u32 tail = desc->tail / 4;	/* in dwords */
>>>-	u32 size = desc->size / 4;	/* in dwords */
>>>+	u32 size = ctb->size / 4;	/* in dwords */
>>>	u32 *cmds = ctb->cmds;
>>>	s32 available;			/* in dwords */
>>>	unsigned int len;
>>>@@ -664,7 +659,7 @@ static int ctb_read(struct intel_guc_ct_buffer 
>>>*ctb, u32 *data)
>>>	}
>>>	CT_DEBUG_DRIVER("CT: received %*ph\n", 4 * len, data);
>>>-	desc->head = head * 4;
>>>+	ctb->head = desc->head = head * 4;
>>>	return 0;
>>>}
>>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h 
>>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>index bc670a796bd8..1bff4f0b91f7 100644
>>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>@@ -29,10 +29,18 @@ struct intel_guc;
>>> *
>>> * @desc: pointer to the buffer descriptor
>>> * @cmds: pointer to the commands buffer
>>>+ * @size: local shadow copy of size
>>
>>I would rather expect this as real fixed size,
>>note that size is not expected to change
>>
>
>Yes, it is fixed over the life of the CTB but not all CTBs need to be the same
>size. e.g. The H2G & G2H may and likely will be different sizes with the new Guc
>interface.
>
>>>+ * @head: local shadow copy of head
>>>+ * @tail: local shadow copy of tail
>>>+ * @space: local shadow copy of space
>>> */
>>>struct intel_guc_ct_buffer {
>>>	struct guc_ct_buffer_desc *desc;
>>>	u32 *cmds;
>>>+	u32 size;
>>>+	u32 tail;
>>>+	u32 head;
>>>+	u32 space;
>>>};
>>>/** Represents pair of command transport buffers.
>>
>>Can we reorder this patch to be first in the series ?
>>
>>Michal

Tried this and I think it makes more sense the way it is. The 'space' value
doesn't have a meaning before the non blocking call is introduced. Also it ends
up changing a bunch of code that is then deleted in the non blocking call patch.
Better to leave it as is.

Matt

>>_______________________________________________
>
>Yes.
>
>>Intel-gfx mailing list
>>Intel-gfx@lists.freedesktop.org
>>https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>_______________________________________________
>Intel-gfx mailing list
>Intel-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Intel-gfx] [PATCH 2/3] drm/i915/guc: Optimized CTB writes and reads
@ 2019-11-22  1:29         ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2019-11-22  1:29 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: Intel-GFX

On Thu, Nov 21, 2019 at 07:56:07AM -0800, Matthew Brost wrote:
>On Thu, Nov 21, 2019 at 12:58:50PM +0100, Michal Wajdeczko wrote:
>>On Thu, 21 Nov 2019 00:56:03 +0100, <John.C.Harrison@intel.com> wrote:
>>
>>>From: Matthew Brost <matthew.brost@intel.com>
>>>
>>>CTB writes are now in the path of command submission and should be
>>>optimized for performance. Rather than reading CTB descriptor values
>>>(e.g. head, tail, size) which could result in accesses across the PCIe
>>>bus, store shadow local copies and only read/write the descriptor
>>>values when absolutely necessary.
>>>
>>>Cc: John Harrison <john.c.harrison@intel.com>
>>>Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>---
>>>drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 79 +++++++++++------------
>>>drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  8 +++
>>>2 files changed, 45 insertions(+), 42 deletions(-)
>>>
>>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
>>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>index e50d968b15d5..4d8a4c6afd71 100644
>>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>@@ -68,23 +68,29 @@ static inline const char 
>>>*guc_ct_buffer_type_to_str(u32 type)
>>>	}
>>>}
>>>-static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
>>>+static void guc_ct_buffer_desc_init(struct intel_guc_ct_buffer *ctb,
>>>				    u32 cmds_addr, u32 size, u32 owner)
>>
>>as now this function takes ctb instead of desc, it should be renamed
>>or make it separate from guc_ct_buffer_desc_init
>>
>
>Yes, makes sense.
>
>>>{
>>>+	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>	CT_DEBUG_DRIVER("CT: desc %p init addr=%#x size=%u owner=%u\n",
>>>			desc, cmds_addr, size, owner);
>>>	memset(desc, 0, sizeof(*desc));
>>>	desc->addr = cmds_addr;
>>>-	desc->size = size;
>>>+	ctb->size = desc->size = size;
>>>	desc->owner = owner;
>>>+	ctb->tail = 0;
>>>+	ctb->head = 0;
>>>+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>>>}
>>>-static void guc_ct_buffer_desc_reset(struct guc_ct_buffer_desc *desc)
>>>+static void guc_ct_buffer_desc_reset(struct intel_guc_ct_buffer *ctb)
>>
>>same here
>>
>
>Same.
>
>>>{
>>>+	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>	CT_DEBUG_DRIVER("CT: desc %p reset head=%u tail=%u\n",
>>>			desc, desc->head, desc->tail);
>>>-	desc->head = 0;
>>>-	desc->tail = 0;
>>>+	ctb->head = desc->head = 0;
>>>+	ctb->tail = desc->tail = 0;
>>>+	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>>>	desc->is_in_error = 0;
>>>}
>>>@@ -220,7 +226,7 @@ static int ctch_enable(struct intel_guc *guc,
>>>	 */
>>>	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
>>>		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
>>>-		guc_ct_buffer_desc_init(ctch->ctbs[i].desc,
>>>+		guc_ct_buffer_desc_init(&ctch->ctbs[i],
>>>					base + PAGE_SIZE/4 * i + PAGE_SIZE/2,
>>>					PAGE_SIZE/4,
>>>					ctch->owner);
>>>@@ -301,32 +307,16 @@ static int ctb_write(struct 
>>>intel_guc_ct_buffer *ctb,
>>>		     bool want_response)
>>>{
>>>	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>-	u32 head = desc->head / 4;	/* in dwords */
>>>-	u32 tail = desc->tail / 4;	/* in dwords */
>>>-	u32 size = desc->size / 4;	/* in dwords */
>>>-	u32 used;			/* in dwords */
>>>+	u32 tail = ctb->tail / 4;	/* in dwords */
>>>+	u32 size = ctb->size / 4;	/* in dwords */
>>>	u32 header;
>>>	u32 *cmds = ctb->cmds;
>>>	unsigned int i;
>>>-	GEM_BUG_ON(desc->size % 4);
>>>-	GEM_BUG_ON(desc->head % 4);
>>>-	GEM_BUG_ON(desc->tail % 4);
>>>+	GEM_BUG_ON(ctb->size % 4);
>>>+	GEM_BUG_ON(ctb->tail % 4);
>>>	GEM_BUG_ON(tail >= size);
>>>-	/*
>>>-	 * tail == head condition indicates empty. GuC FW does not support
>>>-	 * using up the entire buffer to get tail == head meaning full.
>>>-	 */
>>>-	if (tail < head)
>>>-		used = (size - head) + tail;
>>>-	else
>>>-		used = tail - head;
>>>-
>>>-	/* make sure there is a space including extra dw for the fence */
>>>-	if (unlikely(used + len + 1 >= size))
>>>-		return -ENOSPC;
>>>-
>>>	/*
>>>	 * Write the message. The format is the following:
>>>	 * DW0: header (including action code)
>>>@@ -354,15 +344,16 @@ static int ctb_write(struct 
>>>intel_guc_ct_buffer *ctb,
>>>	}
>>>	/* now update desc tail (back in bytes) */
>>>-	desc->tail = tail * 4;
>>>-	GEM_BUG_ON(desc->tail > desc->size);
>>>+	ctb->tail = desc->tail = tail * 4;
>>>+	ctb->space -= (len + 1) * 4;
>>>+	GEM_BUG_ON(ctb->tail > ctb->size);
>>>	return 0;
>>>}
>>>/**
>>> * wait_for_ctb_desc_update - Wait for the CT buffer descriptor update.
>>>- * @desc:	buffer descriptor
>>>+ * @ctb:	ctb buffer
>>> * @fence:	response fence
>>> * @status:	placeholder for status
>>> *
>>>@@ -376,11 +367,12 @@ static int ctb_write(struct 
>>>intel_guc_ct_buffer *ctb,
>>> * *	-ETIMEDOUT no response within hardcoded timeout
>>> * *	-EPROTO no response, CT buffer is in error
>>> */
>>>-static int wait_for_ctb_desc_update(struct guc_ct_buffer_desc *desc,
>>>+static int wait_for_ctb_desc_update(struct intel_guc_ct_buffer *ctb,
>>>				    u32 fence,
>>>				    u32 *status)
>>>{
>>>	int err;
>>>+	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>	/*
>>>	 * Fast commands should complete in less than 10us, so sample quickly
>>>@@ -401,7 +393,7 @@ static int wait_for_ctb_desc_update(struct 
>>>guc_ct_buffer_desc *desc,
>>>			/* Something went wrong with the messaging, try to reset
>>>			 * the buffer and hope for the best
>>>			 */
>>>-			guc_ct_buffer_desc_reset(desc);
>>>+			guc_ct_buffer_desc_reset(ctb);
>>>			err = -EPROTO;
>>>		}
>>>	}
>>>@@ -446,12 +438,17 @@ static int wait_for_ct_request_update(struct 
>>>ct_request *req, u32 *status)
>>>	return err;
>>>}
>>>-static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, 
>>>u32 len)
>>>+static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, 
>>>u32 len)
>>>{
>>>-	u32 head = READ_ONCE(desc->head);
>>>+	u32 head;
>>>	u32 space;
>>>-	space = CIRC_SPACE(desc->tail, head, desc->size);
>>>+	if (ctb->space >= len)
>>>+		return true;
>>>+
>>>+	head = READ_ONCE(ctb->desc->head);
>>>+	space = CIRC_SPACE(ctb->tail, head, ctb->size);
>>>+	ctb->space = space;
>>>	return space >= len;
>>>}
>>>@@ -462,7 +459,6 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
>>>{
>>>	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>>>	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>>-	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>	int err;
>>>	GEM_BUG_ON(!ctch->enabled);
>>>@@ -470,7 +466,7 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
>>>	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>	lockdep_assert_held(&ct->send_lock);
>>>-	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>+	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
>>>		ct->retry++;
>>>		if (ct->retry >= MAX_RETRY)
>>>			return -EDEADLK;
>>>@@ -496,7 +492,6 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>		     u32 *status)
>>>{
>>>	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>>-	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>	struct ct_request request;
>>>	unsigned long flags;
>>>	u32 fence;
>>>@@ -514,7 +509,7 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>	 */
>>>retry:
>>>	spin_lock_irqsave(&ct->send_lock, flags);
>>>-	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>+	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
>>>		spin_unlock_irqrestore(&ct->send_lock, flags);
>>>		ct->retry++;
>>>		if (ct->retry >= MAX_RETRY)
>>>@@ -544,7 +539,7 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>	if (response_buf)
>>>		err = wait_for_ct_request_update(&request, status);
>>>	else
>>>-		err = wait_for_ctb_desc_update(desc, fence, status);
>>>+		err = wait_for_ctb_desc_update(ctb, fence, status);
>>>	if (unlikely(err))
>>>		goto unlink;
>>>@@ -618,9 +613,9 @@ static inline bool ct_header_is_response(u32 header)
>>>static int ctb_read(struct intel_guc_ct_buffer *ctb, u32 *data)
>>>{
>>>	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>-	u32 head = desc->head / 4;	/* in dwords */
>>>+	u32 head = ctb->head / 4;	/* in dwords */
>>>	u32 tail = desc->tail / 4;	/* in dwords */
>>>-	u32 size = desc->size / 4;	/* in dwords */
>>>+	u32 size = ctb->size / 4;	/* in dwords */
>>>	u32 *cmds = ctb->cmds;
>>>	s32 available;			/* in dwords */
>>>	unsigned int len;
>>>@@ -664,7 +659,7 @@ static int ctb_read(struct intel_guc_ct_buffer 
>>>*ctb, u32 *data)
>>>	}
>>>	CT_DEBUG_DRIVER("CT: received %*ph\n", 4 * len, data);
>>>-	desc->head = head * 4;
>>>+	ctb->head = desc->head = head * 4;
>>>	return 0;
>>>}
>>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h 
>>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>index bc670a796bd8..1bff4f0b91f7 100644
>>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>@@ -29,10 +29,18 @@ struct intel_guc;
>>> *
>>> * @desc: pointer to the buffer descriptor
>>> * @cmds: pointer to the commands buffer
>>>+ * @size: local shadow copy of size
>>
>>I would rather expect this as real fixed size,
>>note that size is not expected to change
>>
>
>Yes, it is fixed over the life of the CTB but not all CTBs need to be the same
>size. e.g. The H2G & G2H may and likely will be different sizes with the new Guc
>interface.
>
>>>+ * @head: local shadow copy of head
>>>+ * @tail: local shadow copy of tail
>>>+ * @space: local shadow copy of space
>>> */
>>>struct intel_guc_ct_buffer {
>>>	struct guc_ct_buffer_desc *desc;
>>>	u32 *cmds;
>>>+	u32 size;
>>>+	u32 tail;
>>>+	u32 head;
>>>+	u32 space;
>>>};
>>>/** Represents pair of command transport buffers.
>>
>>Can we reorder this patch to be first in the series ?
>>
>>Michal

Tried this and I think it makes more sense the way it is. The 'space' value
doesn't have a meaning before the non blocking call is introduced. Also it ends
up changing a bunch of code that is then deleted in the non blocking call patch.
Better to leave it as is.

Matt

>>_______________________________________________
>
>Yes.
>
>>Intel-gfx mailing list
>>Intel-gfx@lists.freedesktop.org
>>https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>_______________________________________________
>Intel-gfx mailing list
>Intel-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] drm/i915/guc: Add non blocking CTB send function
@ 2019-11-22  1:34         ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2019-11-22  1:34 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: Intel-GFX

On Thu, Nov 21, 2019 at 04:13:25PM -0800, Matthew Brost wrote:
>On Thu, Nov 21, 2019 at 12:43:26PM +0100, Michal Wajdeczko wrote:
>>On Thu, 21 Nov 2019 00:56:02 +0100, <John.C.Harrison@intel.com> wrote:
>>
>>>From: Matthew Brost <matthew.brost@intel.com>
>>>
>>>Add non blocking CTB send fuction, intel_guc_send_nb. In order to
>>>support a non blocking CTB send fuction a spin lock is needed to
>>
>>2x typos
>>
>>>protect the CTB descriptors fields. Also the non blocking call must not
>>>update the fence value as this value is owned by the blocking call
>>>(intel_guc_send).
>>
>>you probably mean "intel_guc_send_ct", as intel_guc_send is just a wrapper
>>around guc->send
>>
>
>Ah, yes.
>
>>>
>>>The blocking CTB now must have a flow control mechanism to ensure the
>>>buffer isn't overrun. A lazy spin wait is used as we believe the flow
>>>control condition should be rare with properly sized buffer. A retry
>>>counter is also implemented which fails H2G CTBs once a limit is
>>>reached to prevent deadlock.
>>>
>>>The function, intel_guc_send_nb, is exported in this patch but unused.
>>>Several patches later in the series make use of this function.
>>
>>It's likely in yet another series
>>
>
>Yes, it is.
>
>>>
>>>Cc: John Harrison <john.c.harrison@intel.com>
>>>Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>---
>>>drivers/gpu/drm/i915/gt/uc/intel_guc.h    |  2 +
>>>drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 97 +++++++++++++++++++----
>>>drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 10 ++-
>>>3 files changed, 91 insertions(+), 18 deletions(-)
>>>
>>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
>>>b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>index e6400204a2bd..77c5af919ace 100644
>>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>@@ -94,6 +94,8 @@ intel_guc_send_and_receive(struct intel_guc 
>>>*guc, const u32 *action, u32 len,
>>>	return guc->send(guc, action, len, response_buf, response_buf_size);
>>>}
>>>+int intel_guc_send_nb(struct intel_guc_ct *ct, const u32 *action, 
>>>u32 len);
>>>+
>>
>>Hmm, this mismatch of guc/ct parameter breaks the our layering.
>>But we can keep this layering intact by introducing some flags to
>>the existing guc_send() function. These flags could be passed as
>>high bits in action[0], like this:
>
>This seems reasonable.
>

Prototyped this and I don't like it all. First 'action' is a const so what you
are suggesting doesn't work unless that is changed. Also what if all bits in DW
eventually mean something, to me overloading a field isn't a good idea if
anything we should add another argument to guc->send(). But I'd honestly prefer
we just leave it as is. Non-blocking only applies to CTs (not MMIO) and we have
GEM_BUG_ON to protect us if this function is called incorrectly. Doing what you
suggest just makes everything more complicated IMO.

Matt

>>
>>#define GUC_ACTION_FLAG_DONT_WAIT 0x80000000
>>
>>int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset)
>>{
>>	u32 action[] = {
>>		INTEL_GUC_ACTION_AUTHENTICATE_HUC | GUC_ACTION_FLAG_DONT_WAIT,
>>		rsa_offset
>>	};
>>
>>	return intel_guc_send(guc, action, ARRAY_SIZE(action));
>>}
>>
>>then actual back-end of guc->send can take proper steps based on this flag:
>>
>>@@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action, 
>>u32 len,
>>       GEM_BUG_ON(!len);
>>       GEM_BUG_ON(len > guc->send_regs.count);
>>
>>+       if (*action & GUC_ACTION_FLAG_DONT_WAIT)
>>+               return -EINVAL;
>>+       *action &= ~GUC_ACTION_FLAG_DONT_WAIT;
>>+
>>       /* We expect only action code */
>>       GEM_BUG_ON(*action & ~INTEL_GUC_MSG_CODE_MASK);
>>
>>@@ @@ int intel_guc_send_ct(struct intel_guc *guc, const u32 
>>*action, u32 len,
>>       u32 status = ~0; /* undefined */
>>       int ret;
>>
>>+       if (*action & GUC_ACTION_FLAG_DONT_WAIT) {
>>+               GEM_BUG_ON(response_buf);
>>+               GEM_BUG_ON(response_buf_size);
>>+               return ctch_send_nb(ct, ctch, action, len);
>>+       }
>>+
>>       mutex_lock(&guc->send_mutex);
>>
>>       ret = ctch_send(ct, ctch, action, len, response_buf, 
>>response_buf_size,
>>
>>
>>>static inline void intel_guc_notify(struct intel_guc *guc)
>>>{
>>>	guc->notify(guc);
>>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
>>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>index b49115517510..e50d968b15d5 100644
>>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>@@ -3,6 +3,8 @@
>>> * Copyright © 2016-2019 Intel Corporation
>>> */
>>>+#include <linux/circ_buf.h>
>>>+
>>>#include "i915_drv.h"
>>>#include "intel_guc_ct.h"
>>>@@ -12,6 +14,8 @@
>>>#define CT_DEBUG_DRIVER(...)	do { } while (0)
>>>#endif
>>>+#define MAX_RETRY		0x1000000
>>>+
>>>struct ct_request {
>>>	struct list_head link;
>>>	u32 fence;
>>>@@ -40,7 +44,8 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>>>	/* we're using static channel owners */
>>>	ct->host_channel.owner = CTB_OWNER_HOST;
>>>-	spin_lock_init(&ct->lock);
>>>+	spin_lock_init(&ct->request_lock);
>>>+	spin_lock_init(&ct->send_lock);
>>>	INIT_LIST_HEAD(&ct->pending_requests);
>>>	INIT_LIST_HEAD(&ct->incoming_requests);
>>>	INIT_WORK(&ct->worker, ct_incoming_request_worker_func);
>>>@@ -291,7 +296,8 @@ static u32 ctch_get_next_fence(struct 
>>>intel_guc_ct_channel *ctch)
>>>static int ctb_write(struct intel_guc_ct_buffer *ctb,
>>>		     const u32 *action,
>>>		     u32 len /* in dwords */,
>>>-		     u32 fence,
>>>+		     u32 fence_value,
>>>+		     bool enable_fence,
>>
>>maybe we can just guarantee that fence=0 will never be used as a valid
>>fence id, then this flag could be replaced with (fence != 0) check.
>>
>
>Yes, again seems reasonable. Initialize next_fence = 1, then increment by 2 each
>time and this works.
>
>>>		     bool want_response)
>>>{
>>>	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>@@ -328,18 +334,18 @@ static int ctb_write(struct 
>>>intel_guc_ct_buffer *ctb,
>>>	 * DW2+: action data
>>>	 */
>>>	header = (len << GUC_CT_MSG_LEN_SHIFT) |
>>>-		 (GUC_CT_MSG_WRITE_FENCE_TO_DESC) |
>>>+		 (enable_fence ? GUC_CT_MSG_WRITE_FENCE_TO_DESC : 0) |
>>
>>Hmm, even if we ask fw to do not write back fence to the descriptor,
>>IIRC current firmware will unconditionally write back return status
>>of this non-blocking call, possibly overwriting status of the blocked
>>call.
>>
>
>Yes, known problem with the interface that needs to be fixed.
>
>>>		 (want_response ? GUC_CT_MSG_SEND_STATUS : 0) |
>>
>>btw, if we switch all requests to expect reply send back over CTB,
>>then we can possibly drop the send_mutex in CTB paths, and block
>>only when there is no DONT_WAIT flag and we have to wait for response.
>>
>
>Rather just wait for the GuC to fix this.
>
>>>		 (action[0] << GUC_CT_MSG_ACTION_SHIFT);
>>>	CT_DEBUG_DRIVER("CT: writing %*ph %*ph %*ph\n",
>>>-			4, &header, 4, &fence,
>>>+			4, &header, 4, &fence_value,
>>>			4 * (len - 1), &action[1]);
>>>	cmds[tail] = header;
>>>	tail = (tail + 1) % size;
>>>-	cmds[tail] = fence;
>>>+	cmds[tail] = fence_value;
>>>	tail = (tail + 1) % size;
>>>	for (i = 1; i < len; i++) {
>>>@@ -440,6 +446,47 @@ static int wait_for_ct_request_update(struct 
>>>ct_request *req, u32 *status)
>>>	return err;
>>>}
>>>+static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, 
>>>u32 len)
>>>+{
>>>+	u32 head = READ_ONCE(desc->head);
>>>+	u32 space;
>>>+
>>>+	space = CIRC_SPACE(desc->tail, head, desc->size);
>>>+
>>>+	return space >= len;
>>>+}
>>>+
>>>+int intel_guc_send_nb(struct intel_guc_ct *ct,
>>>+		      const u32 *action,
>>>+		      u32 len)
>>>+{
>>>+	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>>>+	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>>+	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>+	int err;
>>>+
>>>+	GEM_BUG_ON(!ctch->enabled);
>>>+	GEM_BUG_ON(!len);
>>>+	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>+	lockdep_assert_held(&ct->send_lock);
>>
>>hmm, does it mean that now it's caller responsibility to spinlock
>>on CT private lock ? That is not how other guc_send() functions work.
>>
>
>Yes, that how I would like this work as I feel like it gives more flexability to
>the caller on the -EBUSY case. The caller can call intel_guc_send_nb again while
>still holding the lock or it release lock the and use a different form of flow
>control. Perhaps locking / unlocking should be exposed via static inlines rather
>than the caller directly manipulating the lock?
>
>>>+
>>>+	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>+		ct->retry++;
>>>+		if (ct->retry >= MAX_RETRY)
>>>+			return -EDEADLK;
>>>+		else
>>>+			return -EBUSY;
>>>+	}
>>>+
>>>+	ct->retry = 0;
>>>+	err = ctb_write(ctb, action, len, 0, false, false);
>>>+	if (unlikely(err))
>>>+		return err;
>>>+
>>>+	intel_guc_notify(ct_to_guc(ct));
>>>+	return 0;
>>>+}
>>>+
>>>static int ctch_send(struct intel_guc_ct *ct,
>>>		     struct intel_guc_ct_channel *ctch,
>>>		     const u32 *action,
>>>@@ -460,17 +507,35 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>	GEM_BUG_ON(!response_buf && response_buf_size);
>>>+	/*
>>>+	 * We use a lazy spin wait loop here as we believe that if the CT
>>>+	 * buffers are sized correctly the flow control condition should be
>>>+	 * rare.
>>>+	 */
>>>+retry:
>>>+	spin_lock_irqsave(&ct->send_lock, flags);
>>>+	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>+		spin_unlock_irqrestore(&ct->send_lock, flags);
>>>+		ct->retry++;
>>>+		if (ct->retry >= MAX_RETRY)
>>>+			return -EDEADLK;
>>
>>I'm not sure what's better: have secret deadlock hard to reproduce,
>>or deadlocks easier to catch that helps improve to be deadlock-clean
>>
>
>This is covering the case where the has died and to avoid deadlock. Eventually
>we will have some GuC health check code that will trigger a full GPU reset if
>the GuC has died. We need a way for code spinning on the CTBs to exit.
>
>I've already tweaked this code locally a bit to use an atomic too with the idea
>being the GuC health check code can set this value to have all code spinning on
>CTBs immediately return --EDEADLK when the GuC has died.
>
>>>+		cpu_relax();
>>>+		goto retry;
>>>+	}
>>>+
>>>+	ct->retry = 0;
>>>	fence = ctch_get_next_fence(ctch);
>>>	request.fence = fence;
>>>	request.status = 0;
>>>	request.response_len = response_buf_size;
>>>	request.response_buf = response_buf;
>>>-	spin_lock_irqsave(&ct->lock, flags);
>>>+	spin_lock(&ct->request_lock);
>>>	list_add_tail(&request.link, &ct->pending_requests);
>>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>>+	spin_unlock(&ct->request_lock);
>>>-	err = ctb_write(ctb, action, len, fence, !!response_buf);
>>>+	err = ctb_write(ctb, action, len, fence, true, !!response_buf);
>>>+	spin_unlock_irqrestore(&ct->send_lock, flags);
>>>	if (unlikely(err))
>>>		goto unlink;
>>>@@ -501,9 +566,9 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>	}
>>>unlink:
>>>-	spin_lock_irqsave(&ct->lock, flags);
>>>+	spin_lock_irqsave(&ct->request_lock, flags);
>>>	list_del(&request.link);
>>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>>+	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>	return err;
>>>}
>>>@@ -653,7 +718,7 @@ static int ct_handle_response(struct 
>>>intel_guc_ct *ct, const u32 *msg)
>>>	CT_DEBUG_DRIVER("CT: response fence %u status %#x\n", fence, status);
>>>-	spin_lock(&ct->lock);
>>>+	spin_lock(&ct->request_lock);
>>>	list_for_each_entry(req, &ct->pending_requests, link) {
>>>		if (unlikely(fence != req->fence)) {
>>>			CT_DEBUG_DRIVER("CT: request %u awaits response\n",
>>>@@ -672,7 +737,7 @@ static int ct_handle_response(struct 
>>>intel_guc_ct *ct, const u32 *msg)
>>>		found = true;
>>>		break;
>>>	}
>>>-	spin_unlock(&ct->lock);
>>>+	spin_unlock(&ct->request_lock);
>>>	if (!found)
>>>		DRM_ERROR("CT: unsolicited response %*ph\n", 4 * msglen, msg);
>>>@@ -710,13 +775,13 @@ static bool 
>>>ct_process_incoming_requests(struct intel_guc_ct *ct)
>>>	u32 *payload;
>>>	bool done;
>>>-	spin_lock_irqsave(&ct->lock, flags);
>>>+	spin_lock_irqsave(&ct->request_lock, flags);
>>>	request = list_first_entry_or_null(&ct->incoming_requests,
>>>					   struct ct_incoming_request, link);
>>>	if (request)
>>>		list_del(&request->link);
>>>	done = !!list_empty(&ct->incoming_requests);
>>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>>+	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>	if (!request)
>>>		return true;
>>>@@ -777,9 +842,9 @@ static int ct_handle_request(struct 
>>>intel_guc_ct *ct, const u32 *msg)
>>>	}
>>>	memcpy(request->msg, msg, 4 * msglen);
>>>-	spin_lock_irqsave(&ct->lock, flags);
>>>+	spin_lock_irqsave(&ct->request_lock, flags);
>>>	list_add_tail(&request->link, &ct->incoming_requests);
>>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>>+	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>	queue_work(system_unbound_wq, &ct->worker);
>>>	return 0;
>>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h 
>>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>index 7c24d83f5c24..bc670a796bd8 100644
>>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>@@ -62,8 +62,11 @@ struct intel_guc_ct {
>>>	struct intel_guc_ct_channel host_channel;
>>>	/* other channels are tbd */
>>>-	/** @lock: protects pending requests list */
>>>-	spinlock_t lock;
>>>+	/** @request_lock: protects pending requests list */
>>>+	spinlock_t request_lock;
>>>+
>>>+	/** @send_lock: protects h2g channel */
>>>+	spinlock_t send_lock;
>>>	/** @pending_requests: list of requests waiting for response */
>>>	struct list_head pending_requests;
>>>@@ -73,6 +76,9 @@ struct intel_guc_ct {
>>>	/** @worker: worker for handling incoming requests */
>>>	struct work_struct worker;
>>>+
>>>+	/** @retry: the number of times a H2G CTB has been retried */
>>>+	u32 retry;
>>>};
>>>void intel_guc_ct_init_early(struct intel_guc_ct *ct);
>>_______________________________________________
>>Intel-gfx mailing list
>>Intel-gfx@lists.freedesktop.org
>>https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>_______________________________________________
>Intel-gfx mailing list
>Intel-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Add non blocking CTB send function
@ 2019-11-22  1:34         ` Matthew Brost
  0 siblings, 0 replies; 36+ messages in thread
From: Matthew Brost @ 2019-11-22  1:34 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: Intel-GFX

On Thu, Nov 21, 2019 at 04:13:25PM -0800, Matthew Brost wrote:
>On Thu, Nov 21, 2019 at 12:43:26PM +0100, Michal Wajdeczko wrote:
>>On Thu, 21 Nov 2019 00:56:02 +0100, <John.C.Harrison@intel.com> wrote:
>>
>>>From: Matthew Brost <matthew.brost@intel.com>
>>>
>>>Add non blocking CTB send fuction, intel_guc_send_nb. In order to
>>>support a non blocking CTB send fuction a spin lock is needed to
>>
>>2x typos
>>
>>>protect the CTB descriptors fields. Also the non blocking call must not
>>>update the fence value as this value is owned by the blocking call
>>>(intel_guc_send).
>>
>>you probably mean "intel_guc_send_ct", as intel_guc_send is just a wrapper
>>around guc->send
>>
>
>Ah, yes.
>
>>>
>>>The blocking CTB now must have a flow control mechanism to ensure the
>>>buffer isn't overrun. A lazy spin wait is used as we believe the flow
>>>control condition should be rare with properly sized buffer. A retry
>>>counter is also implemented which fails H2G CTBs once a limit is
>>>reached to prevent deadlock.
>>>
>>>The function, intel_guc_send_nb, is exported in this patch but unused.
>>>Several patches later in the series make use of this function.
>>
>>It's likely in yet another series
>>
>
>Yes, it is.
>
>>>
>>>Cc: John Harrison <john.c.harrison@intel.com>
>>>Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>---
>>>drivers/gpu/drm/i915/gt/uc/intel_guc.h    |  2 +
>>>drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 97 +++++++++++++++++++----
>>>drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 10 ++-
>>>3 files changed, 91 insertions(+), 18 deletions(-)
>>>
>>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
>>>b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>index e6400204a2bd..77c5af919ace 100644
>>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>@@ -94,6 +94,8 @@ intel_guc_send_and_receive(struct intel_guc 
>>>*guc, const u32 *action, u32 len,
>>>	return guc->send(guc, action, len, response_buf, response_buf_size);
>>>}
>>>+int intel_guc_send_nb(struct intel_guc_ct *ct, const u32 *action, 
>>>u32 len);
>>>+
>>
>>Hmm, this mismatch of guc/ct parameter breaks the our layering.
>>But we can keep this layering intact by introducing some flags to
>>the existing guc_send() function. These flags could be passed as
>>high bits in action[0], like this:
>
>This seems reasonable.
>

Prototyped this and I don't like it all. First 'action' is a const so what you
are suggesting doesn't work unless that is changed. Also what if all bits in DW
eventually mean something, to me overloading a field isn't a good idea if
anything we should add another argument to guc->send(). But I'd honestly prefer
we just leave it as is. Non-blocking only applies to CTs (not MMIO) and we have
GEM_BUG_ON to protect us if this function is called incorrectly. Doing what you
suggest just makes everything more complicated IMO.

Matt

>>
>>#define GUC_ACTION_FLAG_DONT_WAIT 0x80000000
>>
>>int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset)
>>{
>>	u32 action[] = {
>>		INTEL_GUC_ACTION_AUTHENTICATE_HUC | GUC_ACTION_FLAG_DONT_WAIT,
>>		rsa_offset
>>	};
>>
>>	return intel_guc_send(guc, action, ARRAY_SIZE(action));
>>}
>>
>>then actual back-end of guc->send can take proper steps based on this flag:
>>
>>@@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action, 
>>u32 len,
>>       GEM_BUG_ON(!len);
>>       GEM_BUG_ON(len > guc->send_regs.count);
>>
>>+       if (*action & GUC_ACTION_FLAG_DONT_WAIT)
>>+               return -EINVAL;
>>+       *action &= ~GUC_ACTION_FLAG_DONT_WAIT;
>>+
>>       /* We expect only action code */
>>       GEM_BUG_ON(*action & ~INTEL_GUC_MSG_CODE_MASK);
>>
>>@@ @@ int intel_guc_send_ct(struct intel_guc *guc, const u32 
>>*action, u32 len,
>>       u32 status = ~0; /* undefined */
>>       int ret;
>>
>>+       if (*action & GUC_ACTION_FLAG_DONT_WAIT) {
>>+               GEM_BUG_ON(response_buf);
>>+               GEM_BUG_ON(response_buf_size);
>>+               return ctch_send_nb(ct, ctch, action, len);
>>+       }
>>+
>>       mutex_lock(&guc->send_mutex);
>>
>>       ret = ctch_send(ct, ctch, action, len, response_buf, 
>>response_buf_size,
>>
>>
>>>static inline void intel_guc_notify(struct intel_guc *guc)
>>>{
>>>	guc->notify(guc);
>>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
>>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>index b49115517510..e50d968b15d5 100644
>>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>@@ -3,6 +3,8 @@
>>> * Copyright © 2016-2019 Intel Corporation
>>> */
>>>+#include <linux/circ_buf.h>
>>>+
>>>#include "i915_drv.h"
>>>#include "intel_guc_ct.h"
>>>@@ -12,6 +14,8 @@
>>>#define CT_DEBUG_DRIVER(...)	do { } while (0)
>>>#endif
>>>+#define MAX_RETRY		0x1000000
>>>+
>>>struct ct_request {
>>>	struct list_head link;
>>>	u32 fence;
>>>@@ -40,7 +44,8 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>>>	/* we're using static channel owners */
>>>	ct->host_channel.owner = CTB_OWNER_HOST;
>>>-	spin_lock_init(&ct->lock);
>>>+	spin_lock_init(&ct->request_lock);
>>>+	spin_lock_init(&ct->send_lock);
>>>	INIT_LIST_HEAD(&ct->pending_requests);
>>>	INIT_LIST_HEAD(&ct->incoming_requests);
>>>	INIT_WORK(&ct->worker, ct_incoming_request_worker_func);
>>>@@ -291,7 +296,8 @@ static u32 ctch_get_next_fence(struct 
>>>intel_guc_ct_channel *ctch)
>>>static int ctb_write(struct intel_guc_ct_buffer *ctb,
>>>		     const u32 *action,
>>>		     u32 len /* in dwords */,
>>>-		     u32 fence,
>>>+		     u32 fence_value,
>>>+		     bool enable_fence,
>>
>>maybe we can just guarantee that fence=0 will never be used as a valid
>>fence id, then this flag could be replaced with (fence != 0) check.
>>
>
>Yes, again seems reasonable. Initialize next_fence = 1, then increment by 2 each
>time and this works.
>
>>>		     bool want_response)
>>>{
>>>	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>@@ -328,18 +334,18 @@ static int ctb_write(struct 
>>>intel_guc_ct_buffer *ctb,
>>>	 * DW2+: action data
>>>	 */
>>>	header = (len << GUC_CT_MSG_LEN_SHIFT) |
>>>-		 (GUC_CT_MSG_WRITE_FENCE_TO_DESC) |
>>>+		 (enable_fence ? GUC_CT_MSG_WRITE_FENCE_TO_DESC : 0) |
>>
>>Hmm, even if we ask fw to do not write back fence to the descriptor,
>>IIRC current firmware will unconditionally write back return status
>>of this non-blocking call, possibly overwriting status of the blocked
>>call.
>>
>
>Yes, known problem with the interface that needs to be fixed.
>
>>>		 (want_response ? GUC_CT_MSG_SEND_STATUS : 0) |
>>
>>btw, if we switch all requests to expect reply send back over CTB,
>>then we can possibly drop the send_mutex in CTB paths, and block
>>only when there is no DONT_WAIT flag and we have to wait for response.
>>
>
>Rather just wait for the GuC to fix this.
>
>>>		 (action[0] << GUC_CT_MSG_ACTION_SHIFT);
>>>	CT_DEBUG_DRIVER("CT: writing %*ph %*ph %*ph\n",
>>>-			4, &header, 4, &fence,
>>>+			4, &header, 4, &fence_value,
>>>			4 * (len - 1), &action[1]);
>>>	cmds[tail] = header;
>>>	tail = (tail + 1) % size;
>>>-	cmds[tail] = fence;
>>>+	cmds[tail] = fence_value;
>>>	tail = (tail + 1) % size;
>>>	for (i = 1; i < len; i++) {
>>>@@ -440,6 +446,47 @@ static int wait_for_ct_request_update(struct 
>>>ct_request *req, u32 *status)
>>>	return err;
>>>}
>>>+static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, 
>>>u32 len)
>>>+{
>>>+	u32 head = READ_ONCE(desc->head);
>>>+	u32 space;
>>>+
>>>+	space = CIRC_SPACE(desc->tail, head, desc->size);
>>>+
>>>+	return space >= len;
>>>+}
>>>+
>>>+int intel_guc_send_nb(struct intel_guc_ct *ct,
>>>+		      const u32 *action,
>>>+		      u32 len)
>>>+{
>>>+	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>>>+	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>>+	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>+	int err;
>>>+
>>>+	GEM_BUG_ON(!ctch->enabled);
>>>+	GEM_BUG_ON(!len);
>>>+	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>+	lockdep_assert_held(&ct->send_lock);
>>
>>hmm, does it mean that now it's caller responsibility to spinlock
>>on CT private lock ? That is not how other guc_send() functions work.
>>
>
>Yes, that how I would like this work as I feel like it gives more flexability to
>the caller on the -EBUSY case. The caller can call intel_guc_send_nb again while
>still holding the lock or it release lock the and use a different form of flow
>control. Perhaps locking / unlocking should be exposed via static inlines rather
>than the caller directly manipulating the lock?
>
>>>+
>>>+	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>+		ct->retry++;
>>>+		if (ct->retry >= MAX_RETRY)
>>>+			return -EDEADLK;
>>>+		else
>>>+			return -EBUSY;
>>>+	}
>>>+
>>>+	ct->retry = 0;
>>>+	err = ctb_write(ctb, action, len, 0, false, false);
>>>+	if (unlikely(err))
>>>+		return err;
>>>+
>>>+	intel_guc_notify(ct_to_guc(ct));
>>>+	return 0;
>>>+}
>>>+
>>>static int ctch_send(struct intel_guc_ct *ct,
>>>		     struct intel_guc_ct_channel *ctch,
>>>		     const u32 *action,
>>>@@ -460,17 +507,35 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>	GEM_BUG_ON(!response_buf && response_buf_size);
>>>+	/*
>>>+	 * We use a lazy spin wait loop here as we believe that if the CT
>>>+	 * buffers are sized correctly the flow control condition should be
>>>+	 * rare.
>>>+	 */
>>>+retry:
>>>+	spin_lock_irqsave(&ct->send_lock, flags);
>>>+	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>+		spin_unlock_irqrestore(&ct->send_lock, flags);
>>>+		ct->retry++;
>>>+		if (ct->retry >= MAX_RETRY)
>>>+			return -EDEADLK;
>>
>>I'm not sure what's better: have secret deadlock hard to reproduce,
>>or deadlocks easier to catch that helps improve to be deadlock-clean
>>
>
>This is covering the case where the has died and to avoid deadlock. Eventually
>we will have some GuC health check code that will trigger a full GPU reset if
>the GuC has died. We need a way for code spinning on the CTBs to exit.
>
>I've already tweaked this code locally a bit to use an atomic too with the idea
>being the GuC health check code can set this value to have all code spinning on
>CTBs immediately return --EDEADLK when the GuC has died.
>
>>>+		cpu_relax();
>>>+		goto retry;
>>>+	}
>>>+
>>>+	ct->retry = 0;
>>>	fence = ctch_get_next_fence(ctch);
>>>	request.fence = fence;
>>>	request.status = 0;
>>>	request.response_len = response_buf_size;
>>>	request.response_buf = response_buf;
>>>-	spin_lock_irqsave(&ct->lock, flags);
>>>+	spin_lock(&ct->request_lock);
>>>	list_add_tail(&request.link, &ct->pending_requests);
>>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>>+	spin_unlock(&ct->request_lock);
>>>-	err = ctb_write(ctb, action, len, fence, !!response_buf);
>>>+	err = ctb_write(ctb, action, len, fence, true, !!response_buf);
>>>+	spin_unlock_irqrestore(&ct->send_lock, flags);
>>>	if (unlikely(err))
>>>		goto unlink;
>>>@@ -501,9 +566,9 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>	}
>>>unlink:
>>>-	spin_lock_irqsave(&ct->lock, flags);
>>>+	spin_lock_irqsave(&ct->request_lock, flags);
>>>	list_del(&request.link);
>>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>>+	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>	return err;
>>>}
>>>@@ -653,7 +718,7 @@ static int ct_handle_response(struct 
>>>intel_guc_ct *ct, const u32 *msg)
>>>	CT_DEBUG_DRIVER("CT: response fence %u status %#x\n", fence, status);
>>>-	spin_lock(&ct->lock);
>>>+	spin_lock(&ct->request_lock);
>>>	list_for_each_entry(req, &ct->pending_requests, link) {
>>>		if (unlikely(fence != req->fence)) {
>>>			CT_DEBUG_DRIVER("CT: request %u awaits response\n",
>>>@@ -672,7 +737,7 @@ static int ct_handle_response(struct 
>>>intel_guc_ct *ct, const u32 *msg)
>>>		found = true;
>>>		break;
>>>	}
>>>-	spin_unlock(&ct->lock);
>>>+	spin_unlock(&ct->request_lock);
>>>	if (!found)
>>>		DRM_ERROR("CT: unsolicited response %*ph\n", 4 * msglen, msg);
>>>@@ -710,13 +775,13 @@ static bool 
>>>ct_process_incoming_requests(struct intel_guc_ct *ct)
>>>	u32 *payload;
>>>	bool done;
>>>-	spin_lock_irqsave(&ct->lock, flags);
>>>+	spin_lock_irqsave(&ct->request_lock, flags);
>>>	request = list_first_entry_or_null(&ct->incoming_requests,
>>>					   struct ct_incoming_request, link);
>>>	if (request)
>>>		list_del(&request->link);
>>>	done = !!list_empty(&ct->incoming_requests);
>>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>>+	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>	if (!request)
>>>		return true;
>>>@@ -777,9 +842,9 @@ static int ct_handle_request(struct 
>>>intel_guc_ct *ct, const u32 *msg)
>>>	}
>>>	memcpy(request->msg, msg, 4 * msglen);
>>>-	spin_lock_irqsave(&ct->lock, flags);
>>>+	spin_lock_irqsave(&ct->request_lock, flags);
>>>	list_add_tail(&request->link, &ct->incoming_requests);
>>>-	spin_unlock_irqrestore(&ct->lock, flags);
>>>+	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>	queue_work(system_unbound_wq, &ct->worker);
>>>	return 0;
>>>diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h 
>>>b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>index 7c24d83f5c24..bc670a796bd8 100644
>>>--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>@@ -62,8 +62,11 @@ struct intel_guc_ct {
>>>	struct intel_guc_ct_channel host_channel;
>>>	/* other channels are tbd */
>>>-	/** @lock: protects pending requests list */
>>>-	spinlock_t lock;
>>>+	/** @request_lock: protects pending requests list */
>>>+	spinlock_t request_lock;
>>>+
>>>+	/** @send_lock: protects h2g channel */
>>>+	spinlock_t send_lock;
>>>	/** @pending_requests: list of requests waiting for response */
>>>	struct list_head pending_requests;
>>>@@ -73,6 +76,9 @@ struct intel_guc_ct {
>>>	/** @worker: worker for handling incoming requests */
>>>	struct work_struct worker;
>>>+
>>>+	/** @retry: the number of times a H2G CTB has been retried */
>>>+	u32 retry;
>>>};
>>>void intel_guc_ct_init_early(struct intel_guc_ct *ct);
>>_______________________________________________
>>Intel-gfx mailing list
>>Intel-gfx@lists.freedesktop.org
>>https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>_______________________________________________
>Intel-gfx mailing list
>Intel-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* ✓ Fi.CI.IGT: success for drm/i915/guc: CTB improvements
@ 2019-11-22  4:45   ` Patchwork
  0 siblings, 0 replies; 36+ messages in thread
From: Patchwork @ 2019-11-22  4:45 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/guc: CTB improvements
URL   : https://patchwork.freedesktop.org/series/69788/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_7394_full -> Patchwork_15363_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_15363_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_busy@extended-parallel-vcs1:
    - shard-iclb:         [PASS][1] -> [SKIP][2] ([fdo#112080]) +5 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb4/igt@gem_busy@extended-parallel-vcs1.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb8/igt@gem_busy@extended-parallel-vcs1.html

  * igt@gem_ctx_isolation@vcs1-none:
    - shard-iclb:         [PASS][3] -> [SKIP][4] ([fdo#109276] / [fdo#112080]) +1 similar issue
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb1/igt@gem_ctx_isolation@vcs1-none.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb3/igt@gem_ctx_isolation@vcs1-none.html

  * igt@gem_exec_schedule@preempt-queue-bsd:
    - shard-iclb:         [PASS][5] -> [SKIP][6] ([fdo#112146]) +3 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb6/igt@gem_exec_schedule@preempt-queue-bsd.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb1/igt@gem_exec_schedule@preempt-queue-bsd.html

  * igt@gem_exec_schedule@preempt-queue-bsd2:
    - shard-tglb:         [PASS][7] -> [INCOMPLETE][8] ([fdo#111606] / [fdo#111677])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb7/igt@gem_exec_schedule@preempt-queue-bsd2.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb6/igt@gem_exec_schedule@preempt-queue-bsd2.html

  * igt@gem_userptr_blits@dmabuf-unsync:
    - shard-hsw:          [PASS][9] -> [DMESG-WARN][10] ([fdo#111870])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-hsw7/igt@gem_userptr_blits@dmabuf-unsync.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-hsw7/igt@gem_userptr_blits@dmabuf-unsync.html

  * igt@gem_userptr_blits@sync-unmap:
    - shard-snb:          [PASS][11] -> [DMESG-WARN][12] ([fdo#111870])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-snb2/igt@gem_userptr_blits@sync-unmap.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-snb2/igt@gem_userptr_blits@sync-unmap.html

  * igt@kms_cursor_crc@pipe-a-cursor-suspend:
    - shard-tglb:         [PASS][13] -> [INCOMPLETE][14] ([fdo#111832] / [fdo#111850]) +1 similar issue
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb2/igt@kms_cursor_crc@pipe-a-cursor-suspend.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb1/igt@kms_cursor_crc@pipe-a-cursor-suspend.html

  * igt@kms_draw_crc@draw-method-xrgb2101010-pwrite-untiled:
    - shard-skl:          [PASS][15] -> [FAIL][16] ([fdo#103184] / [fdo#103232] / [fdo#108472])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl2/igt@kms_draw_crc@draw-method-xrgb2101010-pwrite-untiled.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl7/igt@kms_draw_crc@draw-method-xrgb2101010-pwrite-untiled.html

  * igt@kms_draw_crc@draw-method-xrgb8888-mmap-gtt-ytiled:
    - shard-skl:          [PASS][17] -> [FAIL][18] ([fdo#103184] / [fdo#103232] / [fdo#108145])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl2/igt@kms_draw_crc@draw-method-xrgb8888-mmap-gtt-ytiled.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl7/igt@kms_draw_crc@draw-method-xrgb8888-mmap-gtt-ytiled.html

  * igt@kms_flip@flip-vs-expired-vblank:
    - shard-skl:          [PASS][19] -> [FAIL][20] ([fdo#105363]) +1 similar issue
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl9/igt@kms_flip@flip-vs-expired-vblank.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl6/igt@kms_flip@flip-vs-expired-vblank.html

  * igt@kms_flip@flip-vs-suspend:
    - shard-skl:          [PASS][21] -> [INCOMPLETE][22] ([fdo#109507])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl3/igt@kms_flip@flip-vs-suspend.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl2/igt@kms_flip@flip-vs-suspend.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-kbl:          [PASS][23] -> [DMESG-WARN][24] ([fdo#108566]) +5 similar issues
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-kbl3/igt@kms_flip@flip-vs-suspend-interruptible.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-kbl2/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-pwrite:
    - shard-iclb:         [PASS][25] -> [FAIL][26] ([fdo#103167]) +2 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb1/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-pwrite.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb1/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-pwrite.html

  * igt@kms_frontbuffer_tracking@fbcpsr-suspend:
    - shard-tglb:         [PASS][27] -> [INCOMPLETE][28] ([fdo#111832] / [fdo#111850] / [fdo#111884])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb9/igt@kms_frontbuffer_tracking@fbcpsr-suspend.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb8/igt@kms_frontbuffer_tracking@fbcpsr-suspend.html

  * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc:
    - shard-skl:          [PASS][29] -> [FAIL][30] ([fdo#108145] / [fdo#110403])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl5/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl3/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html

  * igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min:
    - shard-skl:          [PASS][31] -> [FAIL][32] ([fdo#108145])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl1/igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl2/igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min.html

  * igt@kms_plane_lowres@pipe-a-tiling-x:
    - shard-iclb:         [PASS][33] -> [FAIL][34] ([fdo#103166])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb5/igt@kms_plane_lowres@pipe-a-tiling-x.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb8/igt@kms_plane_lowres@pipe-a-tiling-x.html

  * igt@kms_psr@psr2_primary_mmap_cpu:
    - shard-iclb:         [PASS][35] -> [SKIP][36] ([fdo#109441]) +1 similar issue
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb2/igt@kms_psr@psr2_primary_mmap_cpu.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb4/igt@kms_psr@psr2_primary_mmap_cpu.html

  * igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend:
    - shard-kbl:          [PASS][37] -> [INCOMPLETE][38] ([fdo#103665])
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-kbl1/igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-kbl3/igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend.html
    - shard-tglb:         [PASS][39] -> [INCOMPLETE][40] ([fdo#111850])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb7/igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb8/igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend.html

  * igt@perf@gen8-unprivileged-single-ctx-counters:
    - shard-skl:          [PASS][41] -> [INCOMPLETE][42] ([fdo#111747])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl6/igt@perf@gen8-unprivileged-single-ctx-counters.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl1/igt@perf@gen8-unprivileged-single-ctx-counters.html

  * igt@prime_vgem@fence-wait-bsd2:
    - shard-iclb:         [PASS][43] -> [SKIP][44] ([fdo#109276]) +12 similar issues
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb1/igt@prime_vgem@fence-wait-bsd2.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb6/igt@prime_vgem@fence-wait-bsd2.html

  
#### Possible fixes ####

  * igt@gem_ctx_isolation@vcs1-s3:
    - shard-tglb:         [INCOMPLETE][45] ([fdo#111832]) -> [PASS][46]
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb5/igt@gem_ctx_isolation@vcs1-s3.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb6/igt@gem_ctx_isolation@vcs1-s3.html

  * igt@gem_ctx_persistence@vcs1-cleanup:
    - shard-iclb:         [SKIP][47] ([fdo#109276] / [fdo#112080]) -> [PASS][48] +1 similar issue
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb3/igt@gem_ctx_persistence@vcs1-cleanup.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb4/igt@gem_ctx_persistence@vcs1-cleanup.html

  * igt@gem_ctx_switch@queue-heavy:
    - shard-tglb:         [INCOMPLETE][49] ([fdo#111747]) -> [PASS][50]
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb6/igt@gem_ctx_switch@queue-heavy.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb1/igt@gem_ctx_switch@queue-heavy.html

  * igt@gem_exec_balancer@smoke:
    - shard-iclb:         [SKIP][51] ([fdo#110854]) -> [PASS][52]
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb6/igt@gem_exec_balancer@smoke.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb1/igt@gem_exec_balancer@smoke.html

  * igt@gem_exec_create@forked:
    - shard-tglb:         [INCOMPLETE][53] ([fdo#108838] / [fdo#111747]) -> [PASS][54]
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb6/igt@gem_exec_create@forked.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb5/igt@gem_exec_create@forked.html

  * igt@gem_exec_parallel@vcs1-fds:
    - shard-iclb:         [SKIP][55] ([fdo#112080]) -> [PASS][56] +15 similar issues
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb6/igt@gem_exec_parallel@vcs1-fds.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb1/igt@gem_exec_parallel@vcs1-fds.html

  * igt@gem_exec_schedule@wide-bsd:
    - shard-iclb:         [SKIP][57] ([fdo#112146]) -> [PASS][58] +2 similar issues
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb2/igt@gem_exec_schedule@wide-bsd.html
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb3/igt@gem_exec_schedule@wide-bsd.html

  * igt@gem_exec_suspend@basic-s3:
    - shard-tglb:         [INCOMPLETE][59] ([fdo#111736] / [fdo#111850]) -> [PASS][60]
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb8/igt@gem_exec_suspend@basic-s3.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb5/igt@gem_exec_suspend@basic-s3.html

  * igt@gem_softpin@noreloc-s3:
    - shard-skl:          [INCOMPLETE][61] ([fdo#104108]) -> [PASS][62]
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl10/igt@gem_softpin@noreloc-s3.html
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl7/igt@gem_softpin@noreloc-s3.html

  * igt@gem_userptr_blits@sync-unmap-cycles:
    - shard-snb:          [DMESG-WARN][63] ([fdo#111870]) -> [PASS][64]
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-snb2/igt@gem_userptr_blits@sync-unmap-cycles.html
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-snb1/igt@gem_userptr_blits@sync-unmap-cycles.html

  * igt@i915_selftest@live_execlists:
    - shard-kbl:          [INCOMPLETE][65] ([fdo#103665] / [fdo#112259]) -> [PASS][66]
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-kbl6/igt@i915_selftest@live_execlists.html
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-kbl3/igt@i915_selftest@live_execlists.html

  * igt@i915_selftest@live_gem_contexts:
    - shard-tglb:         [INCOMPLETE][67] ([fdo#111831]) -> [PASS][68]
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb6/igt@i915_selftest@live_gem_contexts.html
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb9/igt@i915_selftest@live_gem_contexts.html

  * igt@kms_cursor_crc@pipe-c-cursor-suspend:
    - shard-kbl:          [DMESG-WARN][69] ([fdo#108566]) -> [PASS][70] +4 similar issues
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-kbl4/igt@kms_cursor_crc@pipe-c-cursor-suspend.html
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-kbl4/igt@kms_cursor_crc@pipe-c-cursor-suspend.html

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-tglb:         [INCOMPLETE][71] ([fdo#111747] / [fdo#111832] / [fdo#111850]) -> [PASS][72]
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb4/igt@kms_fbcon_fbt@fbc-suspend.html
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb9/igt@kms_fbcon_fbt@fbc-suspend.html

  * igt@kms_flip@2x-flip-vs-suspend:
    - shard-hsw:          [INCOMPLETE][73] ([fdo#103540]) -> [PASS][74] +2 similar issues
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-hsw5/igt@kms_flip@2x-flip-vs-suspend.html
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-hsw6/igt@kms_flip@2x-flip-vs-suspend.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-apl:          [DMESG-WARN][75] ([fdo#108566]) -> [PASS][76] +3 similar issues
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-apl4/igt@kms_flip@flip-vs-suspend-interruptible.html
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-apl6/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-render:
    - shard-iclb:         [FAIL][77] ([fdo#103167]) -> [PASS][78] +6 similar issues
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb4/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-render.html
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb2/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-render:
    - shard-tglb:         [FAIL][79] ([fdo#103167]) -> [PASS][80] +2 similar issues
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb9/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-render.html
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb9/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@psr-slowdraw:
    - shard-skl:          [FAIL][81] ([fdo#103167]) -> [PASS][82]
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl8/igt@kms_frontbuffer_tracking@psr-slowdraw.html
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl4/igt@kms_frontbuffer_tracking@psr-slowdraw.html

  * igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min:
    - shard-skl:          [FAIL][83] ([fdo#108145]) -> [PASS][84]
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl8/igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min.html
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl4/igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min.html

  * igt@kms_psr@psr2_primary_blt:
    - shard-iclb:         [SKIP][85] ([fdo#109441]) -> [PASS][86]
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb1/igt@kms_psr@psr2_primary_blt.html
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb2/igt@kms_psr@psr2_primary_blt.html

  * igt@prime_busy@hang-bsd2:
    - shard-iclb:         [SKIP][87] ([fdo#109276]) -> [PASS][88] +11 similar issues
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb3/igt@prime_busy@hang-bsd2.html
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb4/igt@prime_busy@hang-bsd2.html

  
#### Warnings ####

  * igt@gem_ctx_isolation@vcs1-nonpriv-switch:
    - shard-iclb:         [SKIP][89] ([fdo#109276] / [fdo#112080]) -> [FAIL][90] ([fdo#111329])
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb6/igt@gem_ctx_isolation@vcs1-nonpriv-switch.html
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb1/igt@gem_ctx_isolation@vcs1-nonpriv-switch.html

  * igt@gem_eio@kms:
    - shard-snb:          [DMESG-WARN][91] ([fdo#111781] / [fdo#112001]) -> [INCOMPLETE][92] ([fdo#105411])
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-snb1/igt@gem_eio@kms.html
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-snb5/igt@gem_eio@kms.html

  * igt@gem_exec_schedule@deep-bsd1:
    - shard-tglb:         [FAIL][93] ([fdo#111646]) -> [INCOMPLETE][94] ([fdo#111671]) +1 similar issue
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb5/igt@gem_exec_schedule@deep-bsd1.html
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb4/igt@gem_exec_schedule@deep-bsd1.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103166]: https://bugs.freedesktop.org/show_bug.cgi?id=103166
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103184]: https://bugs.freedesktop.org/show_bug.cgi?id=103184
  [fdo#103232]: https://bugs.freedesktop.org/show_bug.cgi?id=103232
  [fdo#103540]: https://bugs.freedesktop.org/show_bug.cgi?id=103540
  [fdo#103665]: https://bugs.freedesktop.org/show_bug.cgi?id=103665
  [fdo#104108]: https://bugs.freedesktop.org/show_bug.cgi?id=104108
  [fdo#105363]: https://bugs.freedesktop.org/show_bug.cgi?id=105363
  [fdo#105411]: https://bugs.freedesktop.org/show_bug.cgi?id=105411
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108472]: https://bugs.freedesktop.org/show_bug.cgi?id=108472
  [fdo#108566]: https://bugs.freedesktop.org/show_bug.cgi?id=108566
  [fdo#108838]: https://bugs.freedesktop.org/show_bug.cgi?id=108838
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109507]: https://bugs.freedesktop.org/show_bug.cgi?id=109507
  [fdo#110403]: https://bugs.freedesktop.org/show_bug.cgi?id=110403
  [fdo#110854]: https://bugs.freedesktop.org/show_bug.cgi?id=110854
  [fdo#111329]: https://bugs.freedesktop.org/show_bug.cgi?id=111329
  [fdo#111606]: https://bugs.freedesktop.org/show_bug.cgi?id=111606
  [fdo#111646]: https://bugs.freedesktop.org/show_bug.cgi?id=111646
  [fdo#111671]: https://bugs.freedesktop.org/show_bug.cgi?id=111671
  [fdo#111677]: https://bugs.freedesktop.org/show_bug.cgi?id=111677
  [fdo#111736]: https://bugs.freedesktop.org/show_bug.cgi?id=111736
  [fdo#111747]: https://bugs.freedesktop.org/show_bug.cgi?id=111747
  [fdo#111781]: https://bugs.freedesktop.org/show_bug.cgi?id=111781
  [fdo#111831]: https://bugs.freedesktop.org/show_bug.cgi?id=111831
  [fdo#111832]: https://bugs.freedesktop.org/show_bug.cgi?id=111832
  [fdo#111850]: https://bugs.freedesktop.org/show_bug.cgi?id=111850
  [fdo#111870]: https://bugs.freedesktop.org/show_bug.cgi?id=111870
  [fdo#111884]: https://bugs.freedesktop.org/show_bug.cgi?id=111884
  [fdo#112001]: https://bugs.freedesktop.org/show_bug.cgi?id=112001
  [fdo#112080]: https://bugs.freedesktop.org/show_bug.cgi?id=112080
  [fdo#1

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915/guc: CTB improvements
@ 2019-11-22  4:45   ` Patchwork
  0 siblings, 0 replies; 36+ messages in thread
From: Patchwork @ 2019-11-22  4:45 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/guc: CTB improvements
URL   : https://patchwork.freedesktop.org/series/69788/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_7394_full -> Patchwork_15363_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_15363_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_busy@extended-parallel-vcs1:
    - shard-iclb:         [PASS][1] -> [SKIP][2] ([fdo#112080]) +5 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb4/igt@gem_busy@extended-parallel-vcs1.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb8/igt@gem_busy@extended-parallel-vcs1.html

  * igt@gem_ctx_isolation@vcs1-none:
    - shard-iclb:         [PASS][3] -> [SKIP][4] ([fdo#109276] / [fdo#112080]) +1 similar issue
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb1/igt@gem_ctx_isolation@vcs1-none.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb3/igt@gem_ctx_isolation@vcs1-none.html

  * igt@gem_exec_schedule@preempt-queue-bsd:
    - shard-iclb:         [PASS][5] -> [SKIP][6] ([fdo#112146]) +3 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb6/igt@gem_exec_schedule@preempt-queue-bsd.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb1/igt@gem_exec_schedule@preempt-queue-bsd.html

  * igt@gem_exec_schedule@preempt-queue-bsd2:
    - shard-tglb:         [PASS][7] -> [INCOMPLETE][8] ([fdo#111606] / [fdo#111677])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb7/igt@gem_exec_schedule@preempt-queue-bsd2.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb6/igt@gem_exec_schedule@preempt-queue-bsd2.html

  * igt@gem_userptr_blits@dmabuf-unsync:
    - shard-hsw:          [PASS][9] -> [DMESG-WARN][10] ([fdo#111870])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-hsw7/igt@gem_userptr_blits@dmabuf-unsync.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-hsw7/igt@gem_userptr_blits@dmabuf-unsync.html

  * igt@gem_userptr_blits@sync-unmap:
    - shard-snb:          [PASS][11] -> [DMESG-WARN][12] ([fdo#111870])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-snb2/igt@gem_userptr_blits@sync-unmap.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-snb2/igt@gem_userptr_blits@sync-unmap.html

  * igt@kms_cursor_crc@pipe-a-cursor-suspend:
    - shard-tglb:         [PASS][13] -> [INCOMPLETE][14] ([fdo#111832] / [fdo#111850]) +1 similar issue
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb2/igt@kms_cursor_crc@pipe-a-cursor-suspend.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb1/igt@kms_cursor_crc@pipe-a-cursor-suspend.html

  * igt@kms_draw_crc@draw-method-xrgb2101010-pwrite-untiled:
    - shard-skl:          [PASS][15] -> [FAIL][16] ([fdo#103184] / [fdo#103232] / [fdo#108472])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl2/igt@kms_draw_crc@draw-method-xrgb2101010-pwrite-untiled.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl7/igt@kms_draw_crc@draw-method-xrgb2101010-pwrite-untiled.html

  * igt@kms_draw_crc@draw-method-xrgb8888-mmap-gtt-ytiled:
    - shard-skl:          [PASS][17] -> [FAIL][18] ([fdo#103184] / [fdo#103232] / [fdo#108145])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl2/igt@kms_draw_crc@draw-method-xrgb8888-mmap-gtt-ytiled.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl7/igt@kms_draw_crc@draw-method-xrgb8888-mmap-gtt-ytiled.html

  * igt@kms_flip@flip-vs-expired-vblank:
    - shard-skl:          [PASS][19] -> [FAIL][20] ([fdo#105363]) +1 similar issue
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl9/igt@kms_flip@flip-vs-expired-vblank.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl6/igt@kms_flip@flip-vs-expired-vblank.html

  * igt@kms_flip@flip-vs-suspend:
    - shard-skl:          [PASS][21] -> [INCOMPLETE][22] ([fdo#109507])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl3/igt@kms_flip@flip-vs-suspend.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl2/igt@kms_flip@flip-vs-suspend.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-kbl:          [PASS][23] -> [DMESG-WARN][24] ([fdo#108566]) +5 similar issues
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-kbl3/igt@kms_flip@flip-vs-suspend-interruptible.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-kbl2/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-pwrite:
    - shard-iclb:         [PASS][25] -> [FAIL][26] ([fdo#103167]) +2 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb1/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-pwrite.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb1/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-pwrite.html

  * igt@kms_frontbuffer_tracking@fbcpsr-suspend:
    - shard-tglb:         [PASS][27] -> [INCOMPLETE][28] ([fdo#111832] / [fdo#111850] / [fdo#111884])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb9/igt@kms_frontbuffer_tracking@fbcpsr-suspend.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb8/igt@kms_frontbuffer_tracking@fbcpsr-suspend.html

  * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc:
    - shard-skl:          [PASS][29] -> [FAIL][30] ([fdo#108145] / [fdo#110403])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl5/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl3/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html

  * igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min:
    - shard-skl:          [PASS][31] -> [FAIL][32] ([fdo#108145])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl1/igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl2/igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min.html

  * igt@kms_plane_lowres@pipe-a-tiling-x:
    - shard-iclb:         [PASS][33] -> [FAIL][34] ([fdo#103166])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb5/igt@kms_plane_lowres@pipe-a-tiling-x.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb8/igt@kms_plane_lowres@pipe-a-tiling-x.html

  * igt@kms_psr@psr2_primary_mmap_cpu:
    - shard-iclb:         [PASS][35] -> [SKIP][36] ([fdo#109441]) +1 similar issue
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb2/igt@kms_psr@psr2_primary_mmap_cpu.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb4/igt@kms_psr@psr2_primary_mmap_cpu.html

  * igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend:
    - shard-kbl:          [PASS][37] -> [INCOMPLETE][38] ([fdo#103665])
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-kbl1/igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-kbl3/igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend.html
    - shard-tglb:         [PASS][39] -> [INCOMPLETE][40] ([fdo#111850])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb7/igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb8/igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend.html

  * igt@perf@gen8-unprivileged-single-ctx-counters:
    - shard-skl:          [PASS][41] -> [INCOMPLETE][42] ([fdo#111747])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl6/igt@perf@gen8-unprivileged-single-ctx-counters.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl1/igt@perf@gen8-unprivileged-single-ctx-counters.html

  * igt@prime_vgem@fence-wait-bsd2:
    - shard-iclb:         [PASS][43] -> [SKIP][44] ([fdo#109276]) +12 similar issues
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb1/igt@prime_vgem@fence-wait-bsd2.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb6/igt@prime_vgem@fence-wait-bsd2.html

  
#### Possible fixes ####

  * igt@gem_ctx_isolation@vcs1-s3:
    - shard-tglb:         [INCOMPLETE][45] ([fdo#111832]) -> [PASS][46]
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb5/igt@gem_ctx_isolation@vcs1-s3.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb6/igt@gem_ctx_isolation@vcs1-s3.html

  * igt@gem_ctx_persistence@vcs1-cleanup:
    - shard-iclb:         [SKIP][47] ([fdo#109276] / [fdo#112080]) -> [PASS][48] +1 similar issue
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb3/igt@gem_ctx_persistence@vcs1-cleanup.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb4/igt@gem_ctx_persistence@vcs1-cleanup.html

  * igt@gem_ctx_switch@queue-heavy:
    - shard-tglb:         [INCOMPLETE][49] ([fdo#111747]) -> [PASS][50]
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb6/igt@gem_ctx_switch@queue-heavy.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb1/igt@gem_ctx_switch@queue-heavy.html

  * igt@gem_exec_balancer@smoke:
    - shard-iclb:         [SKIP][51] ([fdo#110854]) -> [PASS][52]
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb6/igt@gem_exec_balancer@smoke.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb1/igt@gem_exec_balancer@smoke.html

  * igt@gem_exec_create@forked:
    - shard-tglb:         [INCOMPLETE][53] ([fdo#108838] / [fdo#111747]) -> [PASS][54]
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb6/igt@gem_exec_create@forked.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb5/igt@gem_exec_create@forked.html

  * igt@gem_exec_parallel@vcs1-fds:
    - shard-iclb:         [SKIP][55] ([fdo#112080]) -> [PASS][56] +15 similar issues
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb6/igt@gem_exec_parallel@vcs1-fds.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb1/igt@gem_exec_parallel@vcs1-fds.html

  * igt@gem_exec_schedule@wide-bsd:
    - shard-iclb:         [SKIP][57] ([fdo#112146]) -> [PASS][58] +2 similar issues
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb2/igt@gem_exec_schedule@wide-bsd.html
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb3/igt@gem_exec_schedule@wide-bsd.html

  * igt@gem_exec_suspend@basic-s3:
    - shard-tglb:         [INCOMPLETE][59] ([fdo#111736] / [fdo#111850]) -> [PASS][60]
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb8/igt@gem_exec_suspend@basic-s3.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb5/igt@gem_exec_suspend@basic-s3.html

  * igt@gem_softpin@noreloc-s3:
    - shard-skl:          [INCOMPLETE][61] ([fdo#104108]) -> [PASS][62]
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl10/igt@gem_softpin@noreloc-s3.html
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl7/igt@gem_softpin@noreloc-s3.html

  * igt@gem_userptr_blits@sync-unmap-cycles:
    - shard-snb:          [DMESG-WARN][63] ([fdo#111870]) -> [PASS][64]
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-snb2/igt@gem_userptr_blits@sync-unmap-cycles.html
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-snb1/igt@gem_userptr_blits@sync-unmap-cycles.html

  * igt@i915_selftest@live_execlists:
    - shard-kbl:          [INCOMPLETE][65] ([fdo#103665] / [fdo#112259]) -> [PASS][66]
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-kbl6/igt@i915_selftest@live_execlists.html
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-kbl3/igt@i915_selftest@live_execlists.html

  * igt@i915_selftest@live_gem_contexts:
    - shard-tglb:         [INCOMPLETE][67] ([fdo#111831]) -> [PASS][68]
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb6/igt@i915_selftest@live_gem_contexts.html
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb9/igt@i915_selftest@live_gem_contexts.html

  * igt@kms_cursor_crc@pipe-c-cursor-suspend:
    - shard-kbl:          [DMESG-WARN][69] ([fdo#108566]) -> [PASS][70] +4 similar issues
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-kbl4/igt@kms_cursor_crc@pipe-c-cursor-suspend.html
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-kbl4/igt@kms_cursor_crc@pipe-c-cursor-suspend.html

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-tglb:         [INCOMPLETE][71] ([fdo#111747] / [fdo#111832] / [fdo#111850]) -> [PASS][72]
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb4/igt@kms_fbcon_fbt@fbc-suspend.html
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb9/igt@kms_fbcon_fbt@fbc-suspend.html

  * igt@kms_flip@2x-flip-vs-suspend:
    - shard-hsw:          [INCOMPLETE][73] ([fdo#103540]) -> [PASS][74] +2 similar issues
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-hsw5/igt@kms_flip@2x-flip-vs-suspend.html
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-hsw6/igt@kms_flip@2x-flip-vs-suspend.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-apl:          [DMESG-WARN][75] ([fdo#108566]) -> [PASS][76] +3 similar issues
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-apl4/igt@kms_flip@flip-vs-suspend-interruptible.html
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-apl6/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-render:
    - shard-iclb:         [FAIL][77] ([fdo#103167]) -> [PASS][78] +6 similar issues
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb4/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-render.html
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb2/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-render:
    - shard-tglb:         [FAIL][79] ([fdo#103167]) -> [PASS][80] +2 similar issues
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb9/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-render.html
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb9/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@psr-slowdraw:
    - shard-skl:          [FAIL][81] ([fdo#103167]) -> [PASS][82]
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl8/igt@kms_frontbuffer_tracking@psr-slowdraw.html
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl4/igt@kms_frontbuffer_tracking@psr-slowdraw.html

  * igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min:
    - shard-skl:          [FAIL][83] ([fdo#108145]) -> [PASS][84]
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-skl8/igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min.html
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-skl4/igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min.html

  * igt@kms_psr@psr2_primary_blt:
    - shard-iclb:         [SKIP][85] ([fdo#109441]) -> [PASS][86]
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb1/igt@kms_psr@psr2_primary_blt.html
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb2/igt@kms_psr@psr2_primary_blt.html

  * igt@prime_busy@hang-bsd2:
    - shard-iclb:         [SKIP][87] ([fdo#109276]) -> [PASS][88] +11 similar issues
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb3/igt@prime_busy@hang-bsd2.html
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb4/igt@prime_busy@hang-bsd2.html

  
#### Warnings ####

  * igt@gem_ctx_isolation@vcs1-nonpriv-switch:
    - shard-iclb:         [SKIP][89] ([fdo#109276] / [fdo#112080]) -> [FAIL][90] ([fdo#111329])
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-iclb6/igt@gem_ctx_isolation@vcs1-nonpriv-switch.html
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-iclb1/igt@gem_ctx_isolation@vcs1-nonpriv-switch.html

  * igt@gem_eio@kms:
    - shard-snb:          [DMESG-WARN][91] ([fdo#111781] / [fdo#112001]) -> [INCOMPLETE][92] ([fdo#105411])
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-snb1/igt@gem_eio@kms.html
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-snb5/igt@gem_eio@kms.html

  * igt@gem_exec_schedule@deep-bsd1:
    - shard-tglb:         [FAIL][93] ([fdo#111646]) -> [INCOMPLETE][94] ([fdo#111671]) +1 similar issue
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7394/shard-tglb5/igt@gem_exec_schedule@deep-bsd1.html
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/shard-tglb4/igt@gem_exec_schedule@deep-bsd1.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103166]: https://bugs.freedesktop.org/show_bug.cgi?id=103166
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103184]: https://bugs.freedesktop.org/show_bug.cgi?id=103184
  [fdo#103232]: https://bugs.freedesktop.org/show_bug.cgi?id=103232
  [fdo#103540]: https://bugs.freedesktop.org/show_bug.cgi?id=103540
  [fdo#103665]: https://bugs.freedesktop.org/show_bug.cgi?id=103665
  [fdo#104108]: https://bugs.freedesktop.org/show_bug.cgi?id=104108
  [fdo#105363]: https://bugs.freedesktop.org/show_bug.cgi?id=105363
  [fdo#105411]: https://bugs.freedesktop.org/show_bug.cgi?id=105411
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108472]: https://bugs.freedesktop.org/show_bug.cgi?id=108472
  [fdo#108566]: https://bugs.freedesktop.org/show_bug.cgi?id=108566
  [fdo#108838]: https://bugs.freedesktop.org/show_bug.cgi?id=108838
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109507]: https://bugs.freedesktop.org/show_bug.cgi?id=109507
  [fdo#110403]: https://bugs.freedesktop.org/show_bug.cgi?id=110403
  [fdo#110854]: https://bugs.freedesktop.org/show_bug.cgi?id=110854
  [fdo#111329]: https://bugs.freedesktop.org/show_bug.cgi?id=111329
  [fdo#111606]: https://bugs.freedesktop.org/show_bug.cgi?id=111606
  [fdo#111646]: https://bugs.freedesktop.org/show_bug.cgi?id=111646
  [fdo#111671]: https://bugs.freedesktop.org/show_bug.cgi?id=111671
  [fdo#111677]: https://bugs.freedesktop.org/show_bug.cgi?id=111677
  [fdo#111736]: https://bugs.freedesktop.org/show_bug.cgi?id=111736
  [fdo#111747]: https://bugs.freedesktop.org/show_bug.cgi?id=111747
  [fdo#111781]: https://bugs.freedesktop.org/show_bug.cgi?id=111781
  [fdo#111831]: https://bugs.freedesktop.org/show_bug.cgi?id=111831
  [fdo#111832]: https://bugs.freedesktop.org/show_bug.cgi?id=111832
  [fdo#111850]: https://bugs.freedesktop.org/show_bug.cgi?id=111850
  [fdo#111870]: https://bugs.freedesktop.org/show_bug.cgi?id=111870
  [fdo#111884]: https://bugs.freedesktop.org/show_bug.cgi?id=111884
  [fdo#112001]: https://bugs.freedesktop.org/show_bug.cgi?id=112001
  [fdo#112080]: https://bugs.freedesktop.org/show_bug.cgi?id=112080
  [fdo#1

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_15363/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] drm/i915/guc: Add non blocking CTB send function
@ 2019-11-25 13:20         ` Michal Wajdeczko
  0 siblings, 0 replies; 36+ messages in thread
From: Michal Wajdeczko @ 2019-11-25 13:20 UTC (permalink / raw)
  To: Matthew Brost; +Cc: Intel-GFX

On Fri, 22 Nov 2019 01:13:25 +0100, Matthew Brost  
<matthew.brost@intel.com> wrote:

> On Thu, Nov 21, 2019 at 12:43:26PM +0100, Michal Wajdeczko wrote:
>> On Thu, 21 Nov 2019 00:56:02 +0100, <John.C.Harrison@intel.com> wrote:
>>
>>> From: Matthew Brost <matthew.brost@intel.com>
>>>
>>> Add non blocking CTB send fuction, intel_guc_send_nb. In order to
>>> support a non blocking CTB send fuction a spin lock is needed to
>>
>> 2x typos
>>
>>> protect the CTB descriptors fields. Also the non blocking call must not
>>> update the fence value as this value is owned by the blocking call
>>> (intel_guc_send).
>>
>> you probably mean "intel_guc_send_ct", as intel_guc_send is just a  
>> wrapper
>> around guc->send
>>
>
> Ah, yes.
>
>>>
>>> The blocking CTB now must have a flow control mechanism to ensure the
>>> buffer isn't overrun. A lazy spin wait is used as we believe the flow
>>> control condition should be rare with properly sized buffer. A retry
>>> counter is also implemented which fails H2G CTBs once a limit is
>>> reached to prevent deadlock.
>>>
>>> The function, intel_guc_send_nb, is exported in this patch but unused.
>>> Several patches later in the series make use of this function.
>>
>> It's likely in yet another series
>>
>
> Yes, it is.
>
>>>
>>> Cc: John Harrison <john.c.harrison@intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>> drivers/gpu/drm/i915/gt/uc/intel_guc.h    |  2 +
>>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 97 +++++++++++++++++++----
>>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 10 ++-
>>> 3 files changed, 91 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h  
>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> index e6400204a2bd..77c5af919ace 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> @@ -94,6 +94,8 @@ intel_guc_send_and_receive(struct intel_guc *guc,  
>>> const u32 *action, u32 len,
>>> 	return guc->send(guc, action, len, response_buf, response_buf_size);
>>> }
>>> +int intel_guc_send_nb(struct intel_guc_ct *ct, const u32 *action, u32  
>>> len);
>>> +
>>
>> Hmm, this mismatch of guc/ct parameter breaks the our layering.
>> But we can keep this layering intact by introducing some flags to
>> the existing guc_send() function. These flags could be passed as
>> high bits in action[0], like this:
>
> This seems reasonable.
>
>>
>> #define GUC_ACTION_FLAG_DONT_WAIT 0x80000000
>>
>> int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset)
>> {
>> 	u32 action[] = {
>> 		INTEL_GUC_ACTION_AUTHENTICATE_HUC | GUC_ACTION_FLAG_DONT_WAIT,
>> 		rsa_offset
>> 	};
>>
>> 	return intel_guc_send(guc, action, ARRAY_SIZE(action));
>> }
>>
>> then actual back-end of guc->send can take proper steps based on this  
>> flag:
>>
>> @@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action,  
>> u32 len,
>>        GEM_BUG_ON(!len);
>>        GEM_BUG_ON(len > guc->send_regs.count);
>>
>> +       if (*action & GUC_ACTION_FLAG_DONT_WAIT)
>> +               return -EINVAL;
>> +       *action &= ~GUC_ACTION_FLAG_DONT_WAIT;
>> +
>>        /* We expect only action code */
>>        GEM_BUG_ON(*action & ~INTEL_GUC_MSG_CODE_MASK);
>>
>> @@ @@ int intel_guc_send_ct(struct intel_guc *guc, const u32 *action,  
>> u32 len,
>>        u32 status = ~0; /* undefined */
>>        int ret;
>>
>> +       if (*action & GUC_ACTION_FLAG_DONT_WAIT) {
>> +               GEM_BUG_ON(response_buf);
>> +               GEM_BUG_ON(response_buf_size);
>> +               return ctch_send_nb(ct, ctch, action, len);
>> +       }
>> +
>>        mutex_lock(&guc->send_mutex);
>>
>>        ret = ctch_send(ct, ctch, action, len, response_buf,  
>> response_buf_size,
>>
>>
>>> static inline void intel_guc_notify(struct intel_guc *guc)
>>> {
>>> 	guc->notify(guc);
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c  
>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index b49115517510..e50d968b15d5 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -3,6 +3,8 @@
>>>  * Copyright © 2016-2019 Intel Corporation
>>>  */
>>> +#include <linux/circ_buf.h>
>>> +
>>> #include "i915_drv.h"
>>> #include "intel_guc_ct.h"
>>> @@ -12,6 +14,8 @@
>>> #define CT_DEBUG_DRIVER(...)	do { } while (0)
>>> #endif
>>> +#define MAX_RETRY		0x1000000
>>> +
>>> struct ct_request {
>>> 	struct list_head link;
>>> 	u32 fence;
>>> @@ -40,7 +44,8 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>>> 	/* we're using static channel owners */
>>> 	ct->host_channel.owner = CTB_OWNER_HOST;
>>> -	spin_lock_init(&ct->lock);
>>> +	spin_lock_init(&ct->request_lock);
>>> +	spin_lock_init(&ct->send_lock);
>>> 	INIT_LIST_HEAD(&ct->pending_requests);
>>> 	INIT_LIST_HEAD(&ct->incoming_requests);
>>> 	INIT_WORK(&ct->worker, ct_incoming_request_worker_func);
>>> @@ -291,7 +296,8 @@ static u32 ctch_get_next_fence(struct  
>>> intel_guc_ct_channel *ctch)
>>> static int ctb_write(struct intel_guc_ct_buffer *ctb,
>>> 		     const u32 *action,
>>> 		     u32 len /* in dwords */,
>>> -		     u32 fence,
>>> +		     u32 fence_value,
>>> +		     bool enable_fence,
>>
>> maybe we can just guarantee that fence=0 will never be used as a valid
>> fence id, then this flag could be replaced with (fence != 0) check.
>>
>
> Yes, again seems reasonable. Initialize next_fence = 1, then increment  
> by 2 each
> time and this works.

As we need fence only for low volume of blocking messages that requires
response I'm not sure that we need to super optimize get_fence to use +2
all the time - maybe just check for zero wrap ?

>
>>> 		     bool want_response)
>>> {
>>> 	struct guc_ct_buffer_desc *desc = ctb->desc;
>>> @@ -328,18 +334,18 @@ static int ctb_write(struct intel_guc_ct_buffer  
>>> *ctb,
>>> 	 * DW2+: action data
>>> 	 */
>>> 	header = (len << GUC_CT_MSG_LEN_SHIFT) |
>>> -		 (GUC_CT_MSG_WRITE_FENCE_TO_DESC) |
>>> +		 (enable_fence ? GUC_CT_MSG_WRITE_FENCE_TO_DESC : 0) |
>>
>> Hmm, even if we ask fw to do not write back fence to the descriptor,
>> IIRC current firmware will unconditionally write back return status
>> of this non-blocking call, possibly overwriting status of the blocked
>> call.
>>
>
> Yes, known problem with the interface that needs to be fixed.
>
>>> 		 (want_response ? GUC_CT_MSG_SEND_STATUS : 0) |
>>
>> btw, if we switch all requests to expect reply send back over CTB,
>> then we can possibly drop the send_mutex in CTB paths, and block
>> only when there is no DONT_WAIT flag and we have to wait for response.
>>
>
> Rather just wait for the GuC to fix this.

But fixing will probably require some extra flags, while switching
to replies over CTB should work today on existing fw.

>
>>> 		 (action[0] << GUC_CT_MSG_ACTION_SHIFT);
>>> 	CT_DEBUG_DRIVER("CT: writing %*ph %*ph %*ph\n",
>>> -			4, &header, 4, &fence,
>>> +			4, &header, 4, &fence_value,
>>> 			4 * (len - 1), &action[1]);
>>> 	cmds[tail] = header;
>>> 	tail = (tail + 1) % size;
>>> -	cmds[tail] = fence;
>>> +	cmds[tail] = fence_value;
>>> 	tail = (tail + 1) % size;
>>> 	for (i = 1; i < len; i++) {
>>> @@ -440,6 +446,47 @@ static int wait_for_ct_request_update(struct  
>>> ct_request *req, u32 *status)
>>> 	return err;
>>> }
>>> +static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, u32  
>>> len)
>>> +{
>>> +	u32 head = READ_ONCE(desc->head);
>>> +	u32 space;
>>> +
>>> +	space = CIRC_SPACE(desc->tail, head, desc->size);
>>> +
>>> +	return space >= len;
>>> +}
>>> +
>>> +int intel_guc_send_nb(struct intel_guc_ct *ct,
>>> +		      const u32 *action,
>>> +		      u32 len)
>>> +{
>>> +	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>>> +	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>>> +	int err;
>>> +
>>> +	GEM_BUG_ON(!ctch->enabled);
>>> +	GEM_BUG_ON(!len);
>>> +	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>> +	lockdep_assert_held(&ct->send_lock);
>>
>> hmm, does it mean that now it's caller responsibility to spinlock
>> on CT private lock ? That is not how other guc_send() functions work.
>>
>
> Yes, that how I would like this work as I feel like it gives more  
> flexability to
> the caller on the -EBUSY case. The caller can call intel_guc_send_nb  
> again while
> still holding the lock or it release lock the and use a different form  
> of flow
> control. Perhaps locking / unlocking should be exposed via static  
> inlines rather
> than the caller directly manipulating the lock?

Hmm, I'm not sure that I like such flexibility that one set of callers is
allowed to grab and hold CTB internal send_lock, while other caller don't
even know that such lock exists, is a good direction.

Why do you need to sync your callers on CTB internal lock ?
Are you afraid that other clients might steal next free slot in CTB ?
Who are these other clients ? maybe you can lock on something else ?

>
>>> +
>>> +	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>> +		ct->retry++;
>>> +		if (ct->retry >= MAX_RETRY)
>>> +			return -EDEADLK;
>>> +		else
>>> +			return -EBUSY;
>>> +	}
>>> +
>>> +	ct->retry = 0;
>>> +	err = ctb_write(ctb, action, len, 0, false, false);
>>> +	if (unlikely(err))
>>> +		return err;
>>> +
>>> +	intel_guc_notify(ct_to_guc(ct));
>>> +	return 0;
>>> +}
>>> +
>>> static int ctch_send(struct intel_guc_ct *ct,
>>> 		     struct intel_guc_ct_channel *ctch,
>>> 		     const u32 *action,
>>> @@ -460,17 +507,35 @@ static int ctch_send(struct intel_guc_ct *ct,
>>> 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>> 	GEM_BUG_ON(!response_buf && response_buf_size);
>>> +	/*
>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>>> +	 * buffers are sized correctly the flow control condition should be
>>> +	 * rare.
>>> +	 */
>>> +retry:
>>> +	spin_lock_irqsave(&ct->send_lock, flags);
>>> +	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>> +		spin_unlock_irqrestore(&ct->send_lock, flags);
>>> +		ct->retry++;
>>> +		if (ct->retry >= MAX_RETRY)
>>> +			return -EDEADLK;
>>
>> I'm not sure what's better: have secret deadlock hard to reproduce,
>> or deadlocks easier to catch that helps improve to be deadlock-clean
>>
>
> This is covering the case where the has died and to avoid deadlock.  
> Eventually
> we will have some GuC health check code that will trigger a full GPU  
> reset if
> the GuC has died. We need a way for code spinning on the CTBs to exit.
>
> I've already tweaked this code locally a bit to use an atomic too with  
> the idea
> being the GuC health check code can set this value to have all code  
> spinning on
> CTBs immediately return --EDEADLK when the GuC has died.

Will wait then to see next revision.

>
>>> +		cpu_relax();
>>> +		goto retry;
>>> +	}
>>> +
>>> +	ct->retry = 0;
>>> 	fence = ctch_get_next_fence(ctch);
>>> 	request.fence = fence;
>>> 	request.status = 0;
>>> 	request.response_len = response_buf_size;
>>> 	request.response_buf = response_buf;
>>> -	spin_lock_irqsave(&ct->lock, flags);
>>> +	spin_lock(&ct->request_lock);
>>> 	list_add_tail(&request.link, &ct->pending_requests);
>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>> +	spin_unlock(&ct->request_lock);
>>> -	err = ctb_write(ctb, action, len, fence, !!response_buf);
>>> +	err = ctb_write(ctb, action, len, fence, true, !!response_buf);
>>> +	spin_unlock_irqrestore(&ct->send_lock, flags);
>>> 	if (unlikely(err))
>>> 		goto unlink;
>>> @@ -501,9 +566,9 @@ static int ctch_send(struct intel_guc_ct *ct,
>>> 	}
>>> unlink:
>>> -	spin_lock_irqsave(&ct->lock, flags);
>>> +	spin_lock_irqsave(&ct->request_lock, flags);
>>> 	list_del(&request.link);
>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>> +	spin_unlock_irqrestore(&ct->request_lock, flags);
>>> 	return err;
>>> }
>>> @@ -653,7 +718,7 @@ static int ct_handle_response(struct intel_guc_ct  
>>> *ct, const u32 *msg)
>>> 	CT_DEBUG_DRIVER("CT: response fence %u status %#x\n", fence, status);
>>> -	spin_lock(&ct->lock);
>>> +	spin_lock(&ct->request_lock);
>>> 	list_for_each_entry(req, &ct->pending_requests, link) {
>>> 		if (unlikely(fence != req->fence)) {
>>> 			CT_DEBUG_DRIVER("CT: request %u awaits response\n",
>>> @@ -672,7 +737,7 @@ static int ct_handle_response(struct intel_guc_ct  
>>> *ct, const u32 *msg)
>>> 		found = true;
>>> 		break;
>>> 	}
>>> -	spin_unlock(&ct->lock);
>>> +	spin_unlock(&ct->request_lock);
>>> 	if (!found)
>>> 		DRM_ERROR("CT: unsolicited response %*ph\n", 4 * msglen, msg);
>>> @@ -710,13 +775,13 @@ static bool ct_process_incoming_requests(struct  
>>> intel_guc_ct *ct)
>>> 	u32 *payload;
>>> 	bool done;
>>> -	spin_lock_irqsave(&ct->lock, flags);
>>> +	spin_lock_irqsave(&ct->request_lock, flags);
>>> 	request = list_first_entry_or_null(&ct->incoming_requests,
>>> 					   struct ct_incoming_request, link);
>>> 	if (request)
>>> 		list_del(&request->link);
>>> 	done = !!list_empty(&ct->incoming_requests);
>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>> +	spin_unlock_irqrestore(&ct->request_lock, flags);
>>> 	if (!request)
>>> 		return true;
>>> @@ -777,9 +842,9 @@ static int ct_handle_request(struct intel_guc_ct  
>>> *ct, const u32 *msg)
>>> 	}
>>> 	memcpy(request->msg, msg, 4 * msglen);
>>> -	spin_lock_irqsave(&ct->lock, flags);
>>> +	spin_lock_irqsave(&ct->request_lock, flags);
>>> 	list_add_tail(&request->link, &ct->incoming_requests);
>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>> +	spin_unlock_irqrestore(&ct->request_lock, flags);
>>> 	queue_work(system_unbound_wq, &ct->worker);
>>> 	return 0;
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h  
>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> index 7c24d83f5c24..bc670a796bd8 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> @@ -62,8 +62,11 @@ struct intel_guc_ct {
>>> 	struct intel_guc_ct_channel host_channel;
>>> 	/* other channels are tbd */
>>> -	/** @lock: protects pending requests list */
>>> -	spinlock_t lock;
>>> +	/** @request_lock: protects pending requests list */
>>> +	spinlock_t request_lock;
>>> +
>>> +	/** @send_lock: protects h2g channel */
>>> +	spinlock_t send_lock;
>>> 	/** @pending_requests: list of requests waiting for response */
>>> 	struct list_head pending_requests;
>>> @@ -73,6 +76,9 @@ struct intel_guc_ct {
>>> 	/** @worker: worker for handling incoming requests */
>>> 	struct work_struct worker;
>>> +
>>> +	/** @retry: the number of times a H2G CTB has been retried */
>>> +	u32 retry;
>>> };
>>> void intel_guc_ct_init_early(struct intel_guc_ct *ct);
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Add non blocking CTB send function
@ 2019-11-25 13:20         ` Michal Wajdeczko
  0 siblings, 0 replies; 36+ messages in thread
From: Michal Wajdeczko @ 2019-11-25 13:20 UTC (permalink / raw)
  To: Matthew Brost; +Cc: Intel-GFX

On Fri, 22 Nov 2019 01:13:25 +0100, Matthew Brost  
<matthew.brost@intel.com> wrote:

> On Thu, Nov 21, 2019 at 12:43:26PM +0100, Michal Wajdeczko wrote:
>> On Thu, 21 Nov 2019 00:56:02 +0100, <John.C.Harrison@intel.com> wrote:
>>
>>> From: Matthew Brost <matthew.brost@intel.com>
>>>
>>> Add non blocking CTB send fuction, intel_guc_send_nb. In order to
>>> support a non blocking CTB send fuction a spin lock is needed to
>>
>> 2x typos
>>
>>> protect the CTB descriptors fields. Also the non blocking call must not
>>> update the fence value as this value is owned by the blocking call
>>> (intel_guc_send).
>>
>> you probably mean "intel_guc_send_ct", as intel_guc_send is just a  
>> wrapper
>> around guc->send
>>
>
> Ah, yes.
>
>>>
>>> The blocking CTB now must have a flow control mechanism to ensure the
>>> buffer isn't overrun. A lazy spin wait is used as we believe the flow
>>> control condition should be rare with properly sized buffer. A retry
>>> counter is also implemented which fails H2G CTBs once a limit is
>>> reached to prevent deadlock.
>>>
>>> The function, intel_guc_send_nb, is exported in this patch but unused.
>>> Several patches later in the series make use of this function.
>>
>> It's likely in yet another series
>>
>
> Yes, it is.
>
>>>
>>> Cc: John Harrison <john.c.harrison@intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>> drivers/gpu/drm/i915/gt/uc/intel_guc.h    |  2 +
>>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 97 +++++++++++++++++++----
>>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 10 ++-
>>> 3 files changed, 91 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h  
>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> index e6400204a2bd..77c5af919ace 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> @@ -94,6 +94,8 @@ intel_guc_send_and_receive(struct intel_guc *guc,  
>>> const u32 *action, u32 len,
>>> 	return guc->send(guc, action, len, response_buf, response_buf_size);
>>> }
>>> +int intel_guc_send_nb(struct intel_guc_ct *ct, const u32 *action, u32  
>>> len);
>>> +
>>
>> Hmm, this mismatch of guc/ct parameter breaks the our layering.
>> But we can keep this layering intact by introducing some flags to
>> the existing guc_send() function. These flags could be passed as
>> high bits in action[0], like this:
>
> This seems reasonable.
>
>>
>> #define GUC_ACTION_FLAG_DONT_WAIT 0x80000000
>>
>> int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset)
>> {
>> 	u32 action[] = {
>> 		INTEL_GUC_ACTION_AUTHENTICATE_HUC | GUC_ACTION_FLAG_DONT_WAIT,
>> 		rsa_offset
>> 	};
>>
>> 	return intel_guc_send(guc, action, ARRAY_SIZE(action));
>> }
>>
>> then actual back-end of guc->send can take proper steps based on this  
>> flag:
>>
>> @@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action,  
>> u32 len,
>>        GEM_BUG_ON(!len);
>>        GEM_BUG_ON(len > guc->send_regs.count);
>>
>> +       if (*action & GUC_ACTION_FLAG_DONT_WAIT)
>> +               return -EINVAL;
>> +       *action &= ~GUC_ACTION_FLAG_DONT_WAIT;
>> +
>>        /* We expect only action code */
>>        GEM_BUG_ON(*action & ~INTEL_GUC_MSG_CODE_MASK);
>>
>> @@ @@ int intel_guc_send_ct(struct intel_guc *guc, const u32 *action,  
>> u32 len,
>>        u32 status = ~0; /* undefined */
>>        int ret;
>>
>> +       if (*action & GUC_ACTION_FLAG_DONT_WAIT) {
>> +               GEM_BUG_ON(response_buf);
>> +               GEM_BUG_ON(response_buf_size);
>> +               return ctch_send_nb(ct, ctch, action, len);
>> +       }
>> +
>>        mutex_lock(&guc->send_mutex);
>>
>>        ret = ctch_send(ct, ctch, action, len, response_buf,  
>> response_buf_size,
>>
>>
>>> static inline void intel_guc_notify(struct intel_guc *guc)
>>> {
>>> 	guc->notify(guc);
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c  
>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index b49115517510..e50d968b15d5 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -3,6 +3,8 @@
>>>  * Copyright © 2016-2019 Intel Corporation
>>>  */
>>> +#include <linux/circ_buf.h>
>>> +
>>> #include "i915_drv.h"
>>> #include "intel_guc_ct.h"
>>> @@ -12,6 +14,8 @@
>>> #define CT_DEBUG_DRIVER(...)	do { } while (0)
>>> #endif
>>> +#define MAX_RETRY		0x1000000
>>> +
>>> struct ct_request {
>>> 	struct list_head link;
>>> 	u32 fence;
>>> @@ -40,7 +44,8 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>>> 	/* we're using static channel owners */
>>> 	ct->host_channel.owner = CTB_OWNER_HOST;
>>> -	spin_lock_init(&ct->lock);
>>> +	spin_lock_init(&ct->request_lock);
>>> +	spin_lock_init(&ct->send_lock);
>>> 	INIT_LIST_HEAD(&ct->pending_requests);
>>> 	INIT_LIST_HEAD(&ct->incoming_requests);
>>> 	INIT_WORK(&ct->worker, ct_incoming_request_worker_func);
>>> @@ -291,7 +296,8 @@ static u32 ctch_get_next_fence(struct  
>>> intel_guc_ct_channel *ctch)
>>> static int ctb_write(struct intel_guc_ct_buffer *ctb,
>>> 		     const u32 *action,
>>> 		     u32 len /* in dwords */,
>>> -		     u32 fence,
>>> +		     u32 fence_value,
>>> +		     bool enable_fence,
>>
>> maybe we can just guarantee that fence=0 will never be used as a valid
>> fence id, then this flag could be replaced with (fence != 0) check.
>>
>
> Yes, again seems reasonable. Initialize next_fence = 1, then increment  
> by 2 each
> time and this works.

As we need fence only for low volume of blocking messages that requires
response I'm not sure that we need to super optimize get_fence to use +2
all the time - maybe just check for zero wrap ?

>
>>> 		     bool want_response)
>>> {
>>> 	struct guc_ct_buffer_desc *desc = ctb->desc;
>>> @@ -328,18 +334,18 @@ static int ctb_write(struct intel_guc_ct_buffer  
>>> *ctb,
>>> 	 * DW2+: action data
>>> 	 */
>>> 	header = (len << GUC_CT_MSG_LEN_SHIFT) |
>>> -		 (GUC_CT_MSG_WRITE_FENCE_TO_DESC) |
>>> +		 (enable_fence ? GUC_CT_MSG_WRITE_FENCE_TO_DESC : 0) |
>>
>> Hmm, even if we ask fw to do not write back fence to the descriptor,
>> IIRC current firmware will unconditionally write back return status
>> of this non-blocking call, possibly overwriting status of the blocked
>> call.
>>
>
> Yes, known problem with the interface that needs to be fixed.
>
>>> 		 (want_response ? GUC_CT_MSG_SEND_STATUS : 0) |
>>
>> btw, if we switch all requests to expect reply send back over CTB,
>> then we can possibly drop the send_mutex in CTB paths, and block
>> only when there is no DONT_WAIT flag and we have to wait for response.
>>
>
> Rather just wait for the GuC to fix this.

But fixing will probably require some extra flags, while switching
to replies over CTB should work today on existing fw.

>
>>> 		 (action[0] << GUC_CT_MSG_ACTION_SHIFT);
>>> 	CT_DEBUG_DRIVER("CT: writing %*ph %*ph %*ph\n",
>>> -			4, &header, 4, &fence,
>>> +			4, &header, 4, &fence_value,
>>> 			4 * (len - 1), &action[1]);
>>> 	cmds[tail] = header;
>>> 	tail = (tail + 1) % size;
>>> -	cmds[tail] = fence;
>>> +	cmds[tail] = fence_value;
>>> 	tail = (tail + 1) % size;
>>> 	for (i = 1; i < len; i++) {
>>> @@ -440,6 +446,47 @@ static int wait_for_ct_request_update(struct  
>>> ct_request *req, u32 *status)
>>> 	return err;
>>> }
>>> +static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, u32  
>>> len)
>>> +{
>>> +	u32 head = READ_ONCE(desc->head);
>>> +	u32 space;
>>> +
>>> +	space = CIRC_SPACE(desc->tail, head, desc->size);
>>> +
>>> +	return space >= len;
>>> +}
>>> +
>>> +int intel_guc_send_nb(struct intel_guc_ct *ct,
>>> +		      const u32 *action,
>>> +		      u32 len)
>>> +{
>>> +	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>>> +	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>>> +	int err;
>>> +
>>> +	GEM_BUG_ON(!ctch->enabled);
>>> +	GEM_BUG_ON(!len);
>>> +	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>> +	lockdep_assert_held(&ct->send_lock);
>>
>> hmm, does it mean that now it's caller responsibility to spinlock
>> on CT private lock ? That is not how other guc_send() functions work.
>>
>
> Yes, that how I would like this work as I feel like it gives more  
> flexability to
> the caller on the -EBUSY case. The caller can call intel_guc_send_nb  
> again while
> still holding the lock or it release lock the and use a different form  
> of flow
> control. Perhaps locking / unlocking should be exposed via static  
> inlines rather
> than the caller directly manipulating the lock?

Hmm, I'm not sure that I like such flexibility that one set of callers is
allowed to grab and hold CTB internal send_lock, while other caller don't
even know that such lock exists, is a good direction.

Why do you need to sync your callers on CTB internal lock ?
Are you afraid that other clients might steal next free slot in CTB ?
Who are these other clients ? maybe you can lock on something else ?

>
>>> +
>>> +	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>> +		ct->retry++;
>>> +		if (ct->retry >= MAX_RETRY)
>>> +			return -EDEADLK;
>>> +		else
>>> +			return -EBUSY;
>>> +	}
>>> +
>>> +	ct->retry = 0;
>>> +	err = ctb_write(ctb, action, len, 0, false, false);
>>> +	if (unlikely(err))
>>> +		return err;
>>> +
>>> +	intel_guc_notify(ct_to_guc(ct));
>>> +	return 0;
>>> +}
>>> +
>>> static int ctch_send(struct intel_guc_ct *ct,
>>> 		     struct intel_guc_ct_channel *ctch,
>>> 		     const u32 *action,
>>> @@ -460,17 +507,35 @@ static int ctch_send(struct intel_guc_ct *ct,
>>> 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>> 	GEM_BUG_ON(!response_buf && response_buf_size);
>>> +	/*
>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>>> +	 * buffers are sized correctly the flow control condition should be
>>> +	 * rare.
>>> +	 */
>>> +retry:
>>> +	spin_lock_irqsave(&ct->send_lock, flags);
>>> +	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>> +		spin_unlock_irqrestore(&ct->send_lock, flags);
>>> +		ct->retry++;
>>> +		if (ct->retry >= MAX_RETRY)
>>> +			return -EDEADLK;
>>
>> I'm not sure what's better: have secret deadlock hard to reproduce,
>> or deadlocks easier to catch that helps improve to be deadlock-clean
>>
>
> This is covering the case where the has died and to avoid deadlock.  
> Eventually
> we will have some GuC health check code that will trigger a full GPU  
> reset if
> the GuC has died. We need a way for code spinning on the CTBs to exit.
>
> I've already tweaked this code locally a bit to use an atomic too with  
> the idea
> being the GuC health check code can set this value to have all code  
> spinning on
> CTBs immediately return --EDEADLK when the GuC has died.

Will wait then to see next revision.

>
>>> +		cpu_relax();
>>> +		goto retry;
>>> +	}
>>> +
>>> +	ct->retry = 0;
>>> 	fence = ctch_get_next_fence(ctch);
>>> 	request.fence = fence;
>>> 	request.status = 0;
>>> 	request.response_len = response_buf_size;
>>> 	request.response_buf = response_buf;
>>> -	spin_lock_irqsave(&ct->lock, flags);
>>> +	spin_lock(&ct->request_lock);
>>> 	list_add_tail(&request.link, &ct->pending_requests);
>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>> +	spin_unlock(&ct->request_lock);
>>> -	err = ctb_write(ctb, action, len, fence, !!response_buf);
>>> +	err = ctb_write(ctb, action, len, fence, true, !!response_buf);
>>> +	spin_unlock_irqrestore(&ct->send_lock, flags);
>>> 	if (unlikely(err))
>>> 		goto unlink;
>>> @@ -501,9 +566,9 @@ static int ctch_send(struct intel_guc_ct *ct,
>>> 	}
>>> unlink:
>>> -	spin_lock_irqsave(&ct->lock, flags);
>>> +	spin_lock_irqsave(&ct->request_lock, flags);
>>> 	list_del(&request.link);
>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>> +	spin_unlock_irqrestore(&ct->request_lock, flags);
>>> 	return err;
>>> }
>>> @@ -653,7 +718,7 @@ static int ct_handle_response(struct intel_guc_ct  
>>> *ct, const u32 *msg)
>>> 	CT_DEBUG_DRIVER("CT: response fence %u status %#x\n", fence, status);
>>> -	spin_lock(&ct->lock);
>>> +	spin_lock(&ct->request_lock);
>>> 	list_for_each_entry(req, &ct->pending_requests, link) {
>>> 		if (unlikely(fence != req->fence)) {
>>> 			CT_DEBUG_DRIVER("CT: request %u awaits response\n",
>>> @@ -672,7 +737,7 @@ static int ct_handle_response(struct intel_guc_ct  
>>> *ct, const u32 *msg)
>>> 		found = true;
>>> 		break;
>>> 	}
>>> -	spin_unlock(&ct->lock);
>>> +	spin_unlock(&ct->request_lock);
>>> 	if (!found)
>>> 		DRM_ERROR("CT: unsolicited response %*ph\n", 4 * msglen, msg);
>>> @@ -710,13 +775,13 @@ static bool ct_process_incoming_requests(struct  
>>> intel_guc_ct *ct)
>>> 	u32 *payload;
>>> 	bool done;
>>> -	spin_lock_irqsave(&ct->lock, flags);
>>> +	spin_lock_irqsave(&ct->request_lock, flags);
>>> 	request = list_first_entry_or_null(&ct->incoming_requests,
>>> 					   struct ct_incoming_request, link);
>>> 	if (request)
>>> 		list_del(&request->link);
>>> 	done = !!list_empty(&ct->incoming_requests);
>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>> +	spin_unlock_irqrestore(&ct->request_lock, flags);
>>> 	if (!request)
>>> 		return true;
>>> @@ -777,9 +842,9 @@ static int ct_handle_request(struct intel_guc_ct  
>>> *ct, const u32 *msg)
>>> 	}
>>> 	memcpy(request->msg, msg, 4 * msglen);
>>> -	spin_lock_irqsave(&ct->lock, flags);
>>> +	spin_lock_irqsave(&ct->request_lock, flags);
>>> 	list_add_tail(&request->link, &ct->incoming_requests);
>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>> +	spin_unlock_irqrestore(&ct->request_lock, flags);
>>> 	queue_work(system_unbound_wq, &ct->worker);
>>> 	return 0;
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h  
>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> index 7c24d83f5c24..bc670a796bd8 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> @@ -62,8 +62,11 @@ struct intel_guc_ct {
>>> 	struct intel_guc_ct_channel host_channel;
>>> 	/* other channels are tbd */
>>> -	/** @lock: protects pending requests list */
>>> -	spinlock_t lock;
>>> +	/** @request_lock: protects pending requests list */
>>> +	spinlock_t request_lock;
>>> +
>>> +	/** @send_lock: protects h2g channel */
>>> +	spinlock_t send_lock;
>>> 	/** @pending_requests: list of requests waiting for response */
>>> 	struct list_head pending_requests;
>>> @@ -73,6 +76,9 @@ struct intel_guc_ct {
>>> 	/** @worker: worker for handling incoming requests */
>>> 	struct work_struct worker;
>>> +
>>> +	/** @retry: the number of times a H2G CTB has been retried */
>>> +	u32 retry;
>>> };
>>> void intel_guc_ct_init_early(struct intel_guc_ct *ct);
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] drm/i915/guc: Add non blocking CTB send function
@ 2019-11-25 13:39           ` Michal Wajdeczko
  0 siblings, 0 replies; 36+ messages in thread
From: Michal Wajdeczko @ 2019-11-25 13:39 UTC (permalink / raw)
  To: Matthew Brost; +Cc: Intel-GFX

On Fri, 22 Nov 2019 02:34:22 +0100, Matthew Brost  
<matthew.brost@intel.com> wrote:

> On Thu, Nov 21, 2019 at 04:13:25PM -0800, Matthew Brost wrote:
>> On Thu, Nov 21, 2019 at 12:43:26PM +0100, Michal Wajdeczko wrote:
>>> On Thu, 21 Nov 2019 00:56:02 +0100, <John.C.Harrison@intel.com> wrote:
>>>
>>>> From: Matthew Brost <matthew.brost@intel.com>
>>>>
>>>> Add non blocking CTB send fuction, intel_guc_send_nb. In order to
>>>> support a non blocking CTB send fuction a spin lock is needed to
>>>
>>> 2x typos
>>>
>>>> protect the CTB descriptors fields. Also the non blocking call must  
>>>> not
>>>> update the fence value as this value is owned by the blocking call
>>>> (intel_guc_send).
>>>
>>> you probably mean "intel_guc_send_ct", as intel_guc_send is just a  
>>> wrapper
>>> around guc->send
>>>
>>
>> Ah, yes.
>>
>>>>
>>>> The blocking CTB now must have a flow control mechanism to ensure the
>>>> buffer isn't overrun. A lazy spin wait is used as we believe the flow
>>>> control condition should be rare with properly sized buffer. A retry
>>>> counter is also implemented which fails H2G CTBs once a limit is
>>>> reached to prevent deadlock.
>>>>
>>>> The function, intel_guc_send_nb, is exported in this patch but unused.
>>>> Several patches later in the series make use of this function.
>>>
>>> It's likely in yet another series
>>>
>>
>> Yes, it is.
>>
>>>>
>>>> Cc: John Harrison <john.c.harrison@intel.com>
>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>> ---
>>>> drivers/gpu/drm/i915/gt/uc/intel_guc.h    |  2 +
>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 97 +++++++++++++++++++----
>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 10 ++-
>>>> 3 files changed, 91 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h  
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>> index e6400204a2bd..77c5af919ace 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>> @@ -94,6 +94,8 @@ intel_guc_send_and_receive(struct intel_guc *guc,  
>>>> const u32 *action, u32 len,
>>>> 	return guc->send(guc, action, len, response_buf, response_buf_size);
>>>> }
>>>> +int intel_guc_send_nb(struct intel_guc_ct *ct, const u32 *action,  
>>>> u32 len);
>>>> +
>>>
>>> Hmm, this mismatch of guc/ct parameter breaks the our layering.
>>> But we can keep this layering intact by introducing some flags to
>>> the existing guc_send() function. These flags could be passed as
>>> high bits in action[0], like this:
>>
>> This seems reasonable.
>>
>
> Prototyped this and I don't like it all. First 'action' is a const so  
> what you
> are suggesting doesn't work unless that is changed.

I guess only explicit clearing of that flag from const action was a  
mistake,
rest should be ok.

> Also what if all bits in DW
> eventually mean something,

In all guc_send functions we only use lower 16 bits for action code.
This action code is then passed in MMIO messages in bits 15:0 and in
CTB messages in bits 31:16.

> to me overloading a field isn't a good idea if
> anything we should add another argument to guc->send().

guc->send already takes 5 params:

	int (*send)(struct intel_guc *guc, const u32 *data, u32 len,
		    u32 *response_buf, u32 response_buf_size);

We can add explicit flags, but I see nothing wrong in using free top
bits from data[0] to pass these flags there.

> But I'd honestly prefer
> we just leave it as is. Non-blocking only applies to CTs (not MMIO) and  
> we have
> GEM_BUG_ON to protect us if this function is called incorrectly.

Well, I guess many would still prefer to be hit by WARN from guc-send-nop
rather than BUG_ON from guc-send-nb

> Doing what you
> suggest just makes everything more complicated IMO.

I'm not sold that adding GUC_ACTION_FLAG_DONT_WAIT to action[0]
(as shown below) is that complicated

Michal

>
> Matt
>
>>>
>>> #define GUC_ACTION_FLAG_DONT_WAIT 0x80000000
>>>
>>> int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset)
>>> {
>>> 	u32 action[] = {
>>> 		INTEL_GUC_ACTION_AUTHENTICATE_HUC | GUC_ACTION_FLAG_DONT_WAIT,
>>> 		rsa_offset
>>> 	};
>>>
>>> 	return intel_guc_send(guc, action, ARRAY_SIZE(action));
>>> }
>>>
>>> then actual back-end of guc->send can take proper steps based on this  
>>> flag:
>>>
>>> @@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action,  
>>> u32 len,
>>>       GEM_BUG_ON(!len);
>>>       GEM_BUG_ON(len > guc->send_regs.count);
>>>
>>> +       if (*action & GUC_ACTION_FLAG_DONT_WAIT)
>>> +               return -EINVAL;
>>> +       *action &= ~GUC_ACTION_FLAG_DONT_WAIT;
>>> +
>>>       /* We expect only action code */
>>>       GEM_BUG_ON(*action & ~INTEL_GUC_MSG_CODE_MASK);
>>>
>>> @@ @@ int intel_guc_send_ct(struct intel_guc *guc, const u32 *action,  
>>> u32 len,
>>>       u32 status = ~0; /* undefined */
>>>       int ret;
>>>
>>> +       if (*action & GUC_ACTION_FLAG_DONT_WAIT) {
>>> +               GEM_BUG_ON(response_buf);
>>> +               GEM_BUG_ON(response_buf_size);
>>> +               return ctch_send_nb(ct, ctch, action, len);
>>> +       }
>>> +
>>>       mutex_lock(&guc->send_mutex);
>>>
>>>       ret = ctch_send(ct, ctch, action, len, response_buf,  
>>> response_buf_size,
>>>
>>>
>>>> static inline void intel_guc_notify(struct intel_guc *guc)
>>>> {
>>>> 	guc->notify(guc);
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c  
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> index b49115517510..e50d968b15d5 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> @@ -3,6 +3,8 @@
>>>> * Copyright © 2016-2019 Intel Corporation
>>>> */
>>>> +#include <linux/circ_buf.h>
>>>> +
>>>> #include "i915_drv.h"
>>>> #include "intel_guc_ct.h"
>>>> @@ -12,6 +14,8 @@
>>>> #define CT_DEBUG_DRIVER(...)	do { } while (0)
>>>> #endif
>>>> +#define MAX_RETRY		0x1000000
>>>> +
>>>> struct ct_request {
>>>> 	struct list_head link;
>>>> 	u32 fence;
>>>> @@ -40,7 +44,8 @@ void intel_guc_ct_init_early(struct intel_guc_ct  
>>>> *ct)
>>>> 	/* we're using static channel owners */
>>>> 	ct->host_channel.owner = CTB_OWNER_HOST;
>>>> -	spin_lock_init(&ct->lock);
>>>> +	spin_lock_init(&ct->request_lock);
>>>> +	spin_lock_init(&ct->send_lock);
>>>> 	INIT_LIST_HEAD(&ct->pending_requests);
>>>> 	INIT_LIST_HEAD(&ct->incoming_requests);
>>>> 	INIT_WORK(&ct->worker, ct_incoming_request_worker_func);
>>>> @@ -291,7 +296,8 @@ static u32 ctch_get_next_fence(struct  
>>>> intel_guc_ct_channel *ctch)
>>>> static int ctb_write(struct intel_guc_ct_buffer *ctb,
>>>> 		     const u32 *action,
>>>> 		     u32 len /* in dwords */,
>>>> -		     u32 fence,
>>>> +		     u32 fence_value,
>>>> +		     bool enable_fence,
>>>
>>> maybe we can just guarantee that fence=0 will never be used as a valid
>>> fence id, then this flag could be replaced with (fence != 0) check.
>>>
>>
>> Yes, again seems reasonable. Initialize next_fence = 1, then increment  
>> by 2 each
>> time and this works.
>>
>>>> 		     bool want_response)
>>>> {
>>>> 	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> @@ -328,18 +334,18 @@ static int ctb_write(struct intel_guc_ct_buffer  
>>>> *ctb,
>>>> 	 * DW2+: action data
>>>> 	 */
>>>> 	header = (len << GUC_CT_MSG_LEN_SHIFT) |
>>>> -		 (GUC_CT_MSG_WRITE_FENCE_TO_DESC) |
>>>> +		 (enable_fence ? GUC_CT_MSG_WRITE_FENCE_TO_DESC : 0) |
>>>
>>> Hmm, even if we ask fw to do not write back fence to the descriptor,
>>> IIRC current firmware will unconditionally write back return status
>>> of this non-blocking call, possibly overwriting status of the blocked
>>> call.
>>>
>>
>> Yes, known problem with the interface that needs to be fixed.
>>
>>>> 		 (want_response ? GUC_CT_MSG_SEND_STATUS : 0) |
>>>
>>> btw, if we switch all requests to expect reply send back over CTB,
>>> then we can possibly drop the send_mutex in CTB paths, and block
>>> only when there is no DONT_WAIT flag and we have to wait for response.
>>>
>>
>> Rather just wait for the GuC to fix this.
>>
>>>> 		 (action[0] << GUC_CT_MSG_ACTION_SHIFT);
>>>> 	CT_DEBUG_DRIVER("CT: writing %*ph %*ph %*ph\n",
>>>> -			4, &header, 4, &fence,
>>>> +			4, &header, 4, &fence_value,
>>>> 			4 * (len - 1), &action[1]);
>>>> 	cmds[tail] = header;
>>>> 	tail = (tail + 1) % size;
>>>> -	cmds[tail] = fence;
>>>> +	cmds[tail] = fence_value;
>>>> 	tail = (tail + 1) % size;
>>>> 	for (i = 1; i < len; i++) {
>>>> @@ -440,6 +446,47 @@ static int wait_for_ct_request_update(struct  
>>>> ct_request *req, u32 *status)
>>>> 	return err;
>>>> }
>>>> +static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, u32  
>>>> len)
>>>> +{
>>>> +	u32 head = READ_ONCE(desc->head);
>>>> +	u32 space;
>>>> +
>>>> +	space = CIRC_SPACE(desc->tail, head, desc->size);
>>>> +
>>>> +	return space >= len;
>>>> +}
>>>> +
>>>> +int intel_guc_send_nb(struct intel_guc_ct *ct,
>>>> +		      const u32 *action,
>>>> +		      u32 len)
>>>> +{
>>>> +	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>>>> +	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> +	int err;
>>>> +
>>>> +	GEM_BUG_ON(!ctch->enabled);
>>>> +	GEM_BUG_ON(!len);
>>>> +	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>> +	lockdep_assert_held(&ct->send_lock);
>>>
>>> hmm, does it mean that now it's caller responsibility to spinlock
>>> on CT private lock ? That is not how other guc_send() functions work.
>>>
>>
>> Yes, that how I would like this work as I feel like it gives more  
>> flexability to
>> the caller on the -EBUSY case. The caller can call intel_guc_send_nb  
>> again while
>> still holding the lock or it release lock the and use a different form  
>> of flow
>> control. Perhaps locking / unlocking should be exposed via static  
>> inlines rather
>> than the caller directly manipulating the lock?
>>
>>>> +
>>>> +	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>> +		ct->retry++;
>>>> +		if (ct->retry >= MAX_RETRY)
>>>> +			return -EDEADLK;
>>>> +		else
>>>> +			return -EBUSY;
>>>> +	}
>>>> +
>>>> +	ct->retry = 0;
>>>> +	err = ctb_write(ctb, action, len, 0, false, false);
>>>> +	if (unlikely(err))
>>>> +		return err;
>>>> +
>>>> +	intel_guc_notify(ct_to_guc(ct));
>>>> +	return 0;
>>>> +}
>>>> +
>>>> static int ctch_send(struct intel_guc_ct *ct,
>>>> 		     struct intel_guc_ct_channel *ctch,
>>>> 		     const u32 *action,
>>>> @@ -460,17 +507,35 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>> 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>> 	GEM_BUG_ON(!response_buf && response_buf_size);
>>>> +	/*
>>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>>>> +	 * buffers are sized correctly the flow control condition should be
>>>> +	 * rare.
>>>> +	 */
>>>> +retry:
>>>> +	spin_lock_irqsave(&ct->send_lock, flags);
>>>> +	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>> +		spin_unlock_irqrestore(&ct->send_lock, flags);
>>>> +		ct->retry++;
>>>> +		if (ct->retry >= MAX_RETRY)
>>>> +			return -EDEADLK;
>>>
>>> I'm not sure what's better: have secret deadlock hard to reproduce,
>>> or deadlocks easier to catch that helps improve to be deadlock-clean
>>>
>>
>> This is covering the case where the has died and to avoid deadlock.  
>> Eventually
>> we will have some GuC health check code that will trigger a full GPU  
>> reset if
>> the GuC has died. We need a way for code spinning on the CTBs to exit.
>>
>> I've already tweaked this code locally a bit to use an atomic too with  
>> the idea
>> being the GuC health check code can set this value to have all code  
>> spinning on
>> CTBs immediately return --EDEADLK when the GuC has died.
>>
>>>> +		cpu_relax();
>>>> +		goto retry;
>>>> +	}
>>>> +
>>>> +	ct->retry = 0;
>>>> 	fence = ctch_get_next_fence(ctch);
>>>> 	request.fence = fence;
>>>> 	request.status = 0;
>>>> 	request.response_len = response_buf_size;
>>>> 	request.response_buf = response_buf;
>>>> -	spin_lock_irqsave(&ct->lock, flags);
>>>> +	spin_lock(&ct->request_lock);
>>>> 	list_add_tail(&request.link, &ct->pending_requests);
>>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>>> +	spin_unlock(&ct->request_lock);
>>>> -	err = ctb_write(ctb, action, len, fence, !!response_buf);
>>>> +	err = ctb_write(ctb, action, len, fence, true, !!response_buf);
>>>> +	spin_unlock_irqrestore(&ct->send_lock, flags);
>>>> 	if (unlikely(err))
>>>> 		goto unlink;
>>>> @@ -501,9 +566,9 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>> 	}
>>>> unlink:
>>>> -	spin_lock_irqsave(&ct->lock, flags);
>>>> +	spin_lock_irqsave(&ct->request_lock, flags);
>>>> 	list_del(&request.link);
>>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>>> +	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>> 	return err;
>>>> }
>>>> @@ -653,7 +718,7 @@ static int ct_handle_response(struct intel_guc_ct  
>>>> *ct, const u32 *msg)
>>>> 	CT_DEBUG_DRIVER("CT: response fence %u status %#x\n", fence, status);
>>>> -	spin_lock(&ct->lock);
>>>> +	spin_lock(&ct->request_lock);
>>>> 	list_for_each_entry(req, &ct->pending_requests, link) {
>>>> 		if (unlikely(fence != req->fence)) {
>>>> 			CT_DEBUG_DRIVER("CT: request %u awaits response\n",
>>>> @@ -672,7 +737,7 @@ static int ct_handle_response(struct intel_guc_ct  
>>>> *ct, const u32 *msg)
>>>> 		found = true;
>>>> 		break;
>>>> 	}
>>>> -	spin_unlock(&ct->lock);
>>>> +	spin_unlock(&ct->request_lock);
>>>> 	if (!found)
>>>> 		DRM_ERROR("CT: unsolicited response %*ph\n", 4 * msglen, msg);
>>>> @@ -710,13 +775,13 @@ static bool ct_process_incoming_requests(struct  
>>>> intel_guc_ct *ct)
>>>> 	u32 *payload;
>>>> 	bool done;
>>>> -	spin_lock_irqsave(&ct->lock, flags);
>>>> +	spin_lock_irqsave(&ct->request_lock, flags);
>>>> 	request = list_first_entry_or_null(&ct->incoming_requests,
>>>> 					   struct ct_incoming_request, link);
>>>> 	if (request)
>>>> 		list_del(&request->link);
>>>> 	done = !!list_empty(&ct->incoming_requests);
>>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>>> +	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>> 	if (!request)
>>>> 		return true;
>>>> @@ -777,9 +842,9 @@ static int ct_handle_request(struct intel_guc_ct  
>>>> *ct, const u32 *msg)
>>>> 	}
>>>> 	memcpy(request->msg, msg, 4 * msglen);
>>>> -	spin_lock_irqsave(&ct->lock, flags);
>>>> +	spin_lock_irqsave(&ct->request_lock, flags);
>>>> 	list_add_tail(&request->link, &ct->incoming_requests);
>>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>>> +	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>> 	queue_work(system_unbound_wq, &ct->worker);
>>>> 	return 0;
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h  
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> index 7c24d83f5c24..bc670a796bd8 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> @@ -62,8 +62,11 @@ struct intel_guc_ct {
>>>> 	struct intel_guc_ct_channel host_channel;
>>>> 	/* other channels are tbd */
>>>> -	/** @lock: protects pending requests list */
>>>> -	spinlock_t lock;
>>>> +	/** @request_lock: protects pending requests list */
>>>> +	spinlock_t request_lock;
>>>> +
>>>> +	/** @send_lock: protects h2g channel */
>>>> +	spinlock_t send_lock;
>>>> 	/** @pending_requests: list of requests waiting for response */
>>>> 	struct list_head pending_requests;
>>>> @@ -73,6 +76,9 @@ struct intel_guc_ct {
>>>> 	/** @worker: worker for handling incoming requests */
>>>> 	struct work_struct worker;
>>>> +
>>>> +	/** @retry: the number of times a H2G CTB has been retried */
>>>> +	u32 retry;
>>>> };
>>>> void intel_guc_ct_init_early(struct intel_guc_ct *ct);
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Add non blocking CTB send function
@ 2019-11-25 13:39           ` Michal Wajdeczko
  0 siblings, 0 replies; 36+ messages in thread
From: Michal Wajdeczko @ 2019-11-25 13:39 UTC (permalink / raw)
  To: Matthew Brost; +Cc: Intel-GFX

On Fri, 22 Nov 2019 02:34:22 +0100, Matthew Brost  
<matthew.brost@intel.com> wrote:

> On Thu, Nov 21, 2019 at 04:13:25PM -0800, Matthew Brost wrote:
>> On Thu, Nov 21, 2019 at 12:43:26PM +0100, Michal Wajdeczko wrote:
>>> On Thu, 21 Nov 2019 00:56:02 +0100, <John.C.Harrison@intel.com> wrote:
>>>
>>>> From: Matthew Brost <matthew.brost@intel.com>
>>>>
>>>> Add non blocking CTB send fuction, intel_guc_send_nb. In order to
>>>> support a non blocking CTB send fuction a spin lock is needed to
>>>
>>> 2x typos
>>>
>>>> protect the CTB descriptors fields. Also the non blocking call must  
>>>> not
>>>> update the fence value as this value is owned by the blocking call
>>>> (intel_guc_send).
>>>
>>> you probably mean "intel_guc_send_ct", as intel_guc_send is just a  
>>> wrapper
>>> around guc->send
>>>
>>
>> Ah, yes.
>>
>>>>
>>>> The blocking CTB now must have a flow control mechanism to ensure the
>>>> buffer isn't overrun. A lazy spin wait is used as we believe the flow
>>>> control condition should be rare with properly sized buffer. A retry
>>>> counter is also implemented which fails H2G CTBs once a limit is
>>>> reached to prevent deadlock.
>>>>
>>>> The function, intel_guc_send_nb, is exported in this patch but unused.
>>>> Several patches later in the series make use of this function.
>>>
>>> It's likely in yet another series
>>>
>>
>> Yes, it is.
>>
>>>>
>>>> Cc: John Harrison <john.c.harrison@intel.com>
>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>> ---
>>>> drivers/gpu/drm/i915/gt/uc/intel_guc.h    |  2 +
>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 97 +++++++++++++++++++----
>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 10 ++-
>>>> 3 files changed, 91 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h  
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>> index e6400204a2bd..77c5af919ace 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>> @@ -94,6 +94,8 @@ intel_guc_send_and_receive(struct intel_guc *guc,  
>>>> const u32 *action, u32 len,
>>>> 	return guc->send(guc, action, len, response_buf, response_buf_size);
>>>> }
>>>> +int intel_guc_send_nb(struct intel_guc_ct *ct, const u32 *action,  
>>>> u32 len);
>>>> +
>>>
>>> Hmm, this mismatch of guc/ct parameter breaks the our layering.
>>> But we can keep this layering intact by introducing some flags to
>>> the existing guc_send() function. These flags could be passed as
>>> high bits in action[0], like this:
>>
>> This seems reasonable.
>>
>
> Prototyped this and I don't like it all. First 'action' is a const so  
> what you
> are suggesting doesn't work unless that is changed.

I guess only explicit clearing of that flag from const action was a  
mistake,
rest should be ok.

> Also what if all bits in DW
> eventually mean something,

In all guc_send functions we only use lower 16 bits for action code.
This action code is then passed in MMIO messages in bits 15:0 and in
CTB messages in bits 31:16.

> to me overloading a field isn't a good idea if
> anything we should add another argument to guc->send().

guc->send already takes 5 params:

	int (*send)(struct intel_guc *guc, const u32 *data, u32 len,
		    u32 *response_buf, u32 response_buf_size);

We can add explicit flags, but I see nothing wrong in using free top
bits from data[0] to pass these flags there.

> But I'd honestly prefer
> we just leave it as is. Non-blocking only applies to CTs (not MMIO) and  
> we have
> GEM_BUG_ON to protect us if this function is called incorrectly.

Well, I guess many would still prefer to be hit by WARN from guc-send-nop
rather than BUG_ON from guc-send-nb

> Doing what you
> suggest just makes everything more complicated IMO.

I'm not sold that adding GUC_ACTION_FLAG_DONT_WAIT to action[0]
(as shown below) is that complicated

Michal

>
> Matt
>
>>>
>>> #define GUC_ACTION_FLAG_DONT_WAIT 0x80000000
>>>
>>> int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset)
>>> {
>>> 	u32 action[] = {
>>> 		INTEL_GUC_ACTION_AUTHENTICATE_HUC | GUC_ACTION_FLAG_DONT_WAIT,
>>> 		rsa_offset
>>> 	};
>>>
>>> 	return intel_guc_send(guc, action, ARRAY_SIZE(action));
>>> }
>>>
>>> then actual back-end of guc->send can take proper steps based on this  
>>> flag:
>>>
>>> @@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action,  
>>> u32 len,
>>>       GEM_BUG_ON(!len);
>>>       GEM_BUG_ON(len > guc->send_regs.count);
>>>
>>> +       if (*action & GUC_ACTION_FLAG_DONT_WAIT)
>>> +               return -EINVAL;
>>> +       *action &= ~GUC_ACTION_FLAG_DONT_WAIT;
>>> +
>>>       /* We expect only action code */
>>>       GEM_BUG_ON(*action & ~INTEL_GUC_MSG_CODE_MASK);
>>>
>>> @@ @@ int intel_guc_send_ct(struct intel_guc *guc, const u32 *action,  
>>> u32 len,
>>>       u32 status = ~0; /* undefined */
>>>       int ret;
>>>
>>> +       if (*action & GUC_ACTION_FLAG_DONT_WAIT) {
>>> +               GEM_BUG_ON(response_buf);
>>> +               GEM_BUG_ON(response_buf_size);
>>> +               return ctch_send_nb(ct, ctch, action, len);
>>> +       }
>>> +
>>>       mutex_lock(&guc->send_mutex);
>>>
>>>       ret = ctch_send(ct, ctch, action, len, response_buf,  
>>> response_buf_size,
>>>
>>>
>>>> static inline void intel_guc_notify(struct intel_guc *guc)
>>>> {
>>>> 	guc->notify(guc);
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c  
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> index b49115517510..e50d968b15d5 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> @@ -3,6 +3,8 @@
>>>> * Copyright © 2016-2019 Intel Corporation
>>>> */
>>>> +#include <linux/circ_buf.h>
>>>> +
>>>> #include "i915_drv.h"
>>>> #include "intel_guc_ct.h"
>>>> @@ -12,6 +14,8 @@
>>>> #define CT_DEBUG_DRIVER(...)	do { } while (0)
>>>> #endif
>>>> +#define MAX_RETRY		0x1000000
>>>> +
>>>> struct ct_request {
>>>> 	struct list_head link;
>>>> 	u32 fence;
>>>> @@ -40,7 +44,8 @@ void intel_guc_ct_init_early(struct intel_guc_ct  
>>>> *ct)
>>>> 	/* we're using static channel owners */
>>>> 	ct->host_channel.owner = CTB_OWNER_HOST;
>>>> -	spin_lock_init(&ct->lock);
>>>> +	spin_lock_init(&ct->request_lock);
>>>> +	spin_lock_init(&ct->send_lock);
>>>> 	INIT_LIST_HEAD(&ct->pending_requests);
>>>> 	INIT_LIST_HEAD(&ct->incoming_requests);
>>>> 	INIT_WORK(&ct->worker, ct_incoming_request_worker_func);
>>>> @@ -291,7 +296,8 @@ static u32 ctch_get_next_fence(struct  
>>>> intel_guc_ct_channel *ctch)
>>>> static int ctb_write(struct intel_guc_ct_buffer *ctb,
>>>> 		     const u32 *action,
>>>> 		     u32 len /* in dwords */,
>>>> -		     u32 fence,
>>>> +		     u32 fence_value,
>>>> +		     bool enable_fence,
>>>
>>> maybe we can just guarantee that fence=0 will never be used as a valid
>>> fence id, then this flag could be replaced with (fence != 0) check.
>>>
>>
>> Yes, again seems reasonable. Initialize next_fence = 1, then increment  
>> by 2 each
>> time and this works.
>>
>>>> 		     bool want_response)
>>>> {
>>>> 	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> @@ -328,18 +334,18 @@ static int ctb_write(struct intel_guc_ct_buffer  
>>>> *ctb,
>>>> 	 * DW2+: action data
>>>> 	 */
>>>> 	header = (len << GUC_CT_MSG_LEN_SHIFT) |
>>>> -		 (GUC_CT_MSG_WRITE_FENCE_TO_DESC) |
>>>> +		 (enable_fence ? GUC_CT_MSG_WRITE_FENCE_TO_DESC : 0) |
>>>
>>> Hmm, even if we ask fw to do not write back fence to the descriptor,
>>> IIRC current firmware will unconditionally write back return status
>>> of this non-blocking call, possibly overwriting status of the blocked
>>> call.
>>>
>>
>> Yes, known problem with the interface that needs to be fixed.
>>
>>>> 		 (want_response ? GUC_CT_MSG_SEND_STATUS : 0) |
>>>
>>> btw, if we switch all requests to expect reply send back over CTB,
>>> then we can possibly drop the send_mutex in CTB paths, and block
>>> only when there is no DONT_WAIT flag and we have to wait for response.
>>>
>>
>> Rather just wait for the GuC to fix this.
>>
>>>> 		 (action[0] << GUC_CT_MSG_ACTION_SHIFT);
>>>> 	CT_DEBUG_DRIVER("CT: writing %*ph %*ph %*ph\n",
>>>> -			4, &header, 4, &fence,
>>>> +			4, &header, 4, &fence_value,
>>>> 			4 * (len - 1), &action[1]);
>>>> 	cmds[tail] = header;
>>>> 	tail = (tail + 1) % size;
>>>> -	cmds[tail] = fence;
>>>> +	cmds[tail] = fence_value;
>>>> 	tail = (tail + 1) % size;
>>>> 	for (i = 1; i < len; i++) {
>>>> @@ -440,6 +446,47 @@ static int wait_for_ct_request_update(struct  
>>>> ct_request *req, u32 *status)
>>>> 	return err;
>>>> }
>>>> +static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, u32  
>>>> len)
>>>> +{
>>>> +	u32 head = READ_ONCE(desc->head);
>>>> +	u32 space;
>>>> +
>>>> +	space = CIRC_SPACE(desc->tail, head, desc->size);
>>>> +
>>>> +	return space >= len;
>>>> +}
>>>> +
>>>> +int intel_guc_send_nb(struct intel_guc_ct *ct,
>>>> +		      const u32 *action,
>>>> +		      u32 len)
>>>> +{
>>>> +	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>>>> +	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> +	int err;
>>>> +
>>>> +	GEM_BUG_ON(!ctch->enabled);
>>>> +	GEM_BUG_ON(!len);
>>>> +	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>> +	lockdep_assert_held(&ct->send_lock);
>>>
>>> hmm, does it mean that now it's caller responsibility to spinlock
>>> on CT private lock ? That is not how other guc_send() functions work.
>>>
>>
>> Yes, that how I would like this work as I feel like it gives more  
>> flexability to
>> the caller on the -EBUSY case. The caller can call intel_guc_send_nb  
>> again while
>> still holding the lock or it release lock the and use a different form  
>> of flow
>> control. Perhaps locking / unlocking should be exposed via static  
>> inlines rather
>> than the caller directly manipulating the lock?
>>
>>>> +
>>>> +	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>> +		ct->retry++;
>>>> +		if (ct->retry >= MAX_RETRY)
>>>> +			return -EDEADLK;
>>>> +		else
>>>> +			return -EBUSY;
>>>> +	}
>>>> +
>>>> +	ct->retry = 0;
>>>> +	err = ctb_write(ctb, action, len, 0, false, false);
>>>> +	if (unlikely(err))
>>>> +		return err;
>>>> +
>>>> +	intel_guc_notify(ct_to_guc(ct));
>>>> +	return 0;
>>>> +}
>>>> +
>>>> static int ctch_send(struct intel_guc_ct *ct,
>>>> 		     struct intel_guc_ct_channel *ctch,
>>>> 		     const u32 *action,
>>>> @@ -460,17 +507,35 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>> 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>> 	GEM_BUG_ON(!response_buf && response_buf_size);
>>>> +	/*
>>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>>>> +	 * buffers are sized correctly the flow control condition should be
>>>> +	 * rare.
>>>> +	 */
>>>> +retry:
>>>> +	spin_lock_irqsave(&ct->send_lock, flags);
>>>> +	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>> +		spin_unlock_irqrestore(&ct->send_lock, flags);
>>>> +		ct->retry++;
>>>> +		if (ct->retry >= MAX_RETRY)
>>>> +			return -EDEADLK;
>>>
>>> I'm not sure what's better: have secret deadlock hard to reproduce,
>>> or deadlocks easier to catch that helps improve to be deadlock-clean
>>>
>>
>> This is covering the case where the has died and to avoid deadlock.  
>> Eventually
>> we will have some GuC health check code that will trigger a full GPU  
>> reset if
>> the GuC has died. We need a way for code spinning on the CTBs to exit.
>>
>> I've already tweaked this code locally a bit to use an atomic too with  
>> the idea
>> being the GuC health check code can set this value to have all code  
>> spinning on
>> CTBs immediately return --EDEADLK when the GuC has died.
>>
>>>> +		cpu_relax();
>>>> +		goto retry;
>>>> +	}
>>>> +
>>>> +	ct->retry = 0;
>>>> 	fence = ctch_get_next_fence(ctch);
>>>> 	request.fence = fence;
>>>> 	request.status = 0;
>>>> 	request.response_len = response_buf_size;
>>>> 	request.response_buf = response_buf;
>>>> -	spin_lock_irqsave(&ct->lock, flags);
>>>> +	spin_lock(&ct->request_lock);
>>>> 	list_add_tail(&request.link, &ct->pending_requests);
>>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>>> +	spin_unlock(&ct->request_lock);
>>>> -	err = ctb_write(ctb, action, len, fence, !!response_buf);
>>>> +	err = ctb_write(ctb, action, len, fence, true, !!response_buf);
>>>> +	spin_unlock_irqrestore(&ct->send_lock, flags);
>>>> 	if (unlikely(err))
>>>> 		goto unlink;
>>>> @@ -501,9 +566,9 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>> 	}
>>>> unlink:
>>>> -	spin_lock_irqsave(&ct->lock, flags);
>>>> +	spin_lock_irqsave(&ct->request_lock, flags);
>>>> 	list_del(&request.link);
>>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>>> +	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>> 	return err;
>>>> }
>>>> @@ -653,7 +718,7 @@ static int ct_handle_response(struct intel_guc_ct  
>>>> *ct, const u32 *msg)
>>>> 	CT_DEBUG_DRIVER("CT: response fence %u status %#x\n", fence, status);
>>>> -	spin_lock(&ct->lock);
>>>> +	spin_lock(&ct->request_lock);
>>>> 	list_for_each_entry(req, &ct->pending_requests, link) {
>>>> 		if (unlikely(fence != req->fence)) {
>>>> 			CT_DEBUG_DRIVER("CT: request %u awaits response\n",
>>>> @@ -672,7 +737,7 @@ static int ct_handle_response(struct intel_guc_ct  
>>>> *ct, const u32 *msg)
>>>> 		found = true;
>>>> 		break;
>>>> 	}
>>>> -	spin_unlock(&ct->lock);
>>>> +	spin_unlock(&ct->request_lock);
>>>> 	if (!found)
>>>> 		DRM_ERROR("CT: unsolicited response %*ph\n", 4 * msglen, msg);
>>>> @@ -710,13 +775,13 @@ static bool ct_process_incoming_requests(struct  
>>>> intel_guc_ct *ct)
>>>> 	u32 *payload;
>>>> 	bool done;
>>>> -	spin_lock_irqsave(&ct->lock, flags);
>>>> +	spin_lock_irqsave(&ct->request_lock, flags);
>>>> 	request = list_first_entry_or_null(&ct->incoming_requests,
>>>> 					   struct ct_incoming_request, link);
>>>> 	if (request)
>>>> 		list_del(&request->link);
>>>> 	done = !!list_empty(&ct->incoming_requests);
>>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>>> +	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>> 	if (!request)
>>>> 		return true;
>>>> @@ -777,9 +842,9 @@ static int ct_handle_request(struct intel_guc_ct  
>>>> *ct, const u32 *msg)
>>>> 	}
>>>> 	memcpy(request->msg, msg, 4 * msglen);
>>>> -	spin_lock_irqsave(&ct->lock, flags);
>>>> +	spin_lock_irqsave(&ct->request_lock, flags);
>>>> 	list_add_tail(&request->link, &ct->incoming_requests);
>>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>>> +	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>> 	queue_work(system_unbound_wq, &ct->worker);
>>>> 	return 0;
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h  
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> index 7c24d83f5c24..bc670a796bd8 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> @@ -62,8 +62,11 @@ struct intel_guc_ct {
>>>> 	struct intel_guc_ct_channel host_channel;
>>>> 	/* other channels are tbd */
>>>> -	/** @lock: protects pending requests list */
>>>> -	spinlock_t lock;
>>>> +	/** @request_lock: protects pending requests list */
>>>> +	spinlock_t request_lock;
>>>> +
>>>> +	/** @send_lock: protects h2g channel */
>>>> +	spinlock_t send_lock;
>>>> 	/** @pending_requests: list of requests waiting for response */
>>>> 	struct list_head pending_requests;
>>>> @@ -73,6 +76,9 @@ struct intel_guc_ct {
>>>> 	/** @worker: worker for handling incoming requests */
>>>> 	struct work_struct worker;
>>>> +
>>>> +	/** @retry: the number of times a H2G CTB has been retried */
>>>> +	u32 retry;
>>>> };
>>>> void intel_guc_ct_init_early(struct intel_guc_ct *ct);
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] drm/i915/guc: Optimized CTB writes and reads
@ 2019-11-25 13:48           ` Michal Wajdeczko
  0 siblings, 0 replies; 36+ messages in thread
From: Michal Wajdeczko @ 2019-11-25 13:48 UTC (permalink / raw)
  To: Matthew Brost; +Cc: Intel-GFX

On Fri, 22 Nov 2019 02:29:59 +0100, Matthew Brost  
<matthew.brost@intel.com> wrote:

> On Thu, Nov 21, 2019 at 07:56:07AM -0800, Matthew Brost wrote:
>> On Thu, Nov 21, 2019 at 12:58:50PM +0100, Michal Wajdeczko wrote:
>>> On Thu, 21 Nov 2019 00:56:03 +0100, <John.C.Harrison@intel.com> wrote:
>>>
>>>> From: Matthew Brost <matthew.brost@intel.com>
>>>>
>>>> CTB writes are now in the path of command submission and should be
>>>> optimized for performance. Rather than reading CTB descriptor values
>>>> (e.g. head, tail, size) which could result in accesses across the PCIe
>>>> bus, store shadow local copies and only read/write the descriptor
>>>> values when absolutely necessary.
>>>>
>>>> Cc: John Harrison <john.c.harrison@intel.com>
>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>> ---
>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 79 +++++++++++------------
>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  8 +++
>>>> 2 files changed, 45 insertions(+), 42 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c  
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> index e50d968b15d5..4d8a4c6afd71 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> @@ -68,23 +68,29 @@ static inline const char  
>>>> *guc_ct_buffer_type_to_str(u32 type)
>>>> 	}
>>>> }
>>>> -static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
>>>> +static void guc_ct_buffer_desc_init(struct intel_guc_ct_buffer *ctb,
>>>> 				    u32 cmds_addr, u32 size, u32 owner)
>>>
>>> as now this function takes ctb instead of desc, it should be renamed
>>> or make it separate from guc_ct_buffer_desc_init
>>>
>>
>> Yes, makes sense.
>>
>>>> {
>>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> 	CT_DEBUG_DRIVER("CT: desc %p init addr=%#x size=%u owner=%u\n",
>>>> 			desc, cmds_addr, size, owner);
>>>> 	memset(desc, 0, sizeof(*desc));
>>>> 	desc->addr = cmds_addr;
>>>> -	desc->size = size;
>>>> +	ctb->size = desc->size = size;
>>>> 	desc->owner = owner;
>>>> +	ctb->tail = 0;
>>>> +	ctb->head = 0;
>>>> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>>>> }
>>>> -static void guc_ct_buffer_desc_reset(struct guc_ct_buffer_desc *desc)
>>>> +static void guc_ct_buffer_desc_reset(struct intel_guc_ct_buffer *ctb)
>>>
>>> same here
>>>
>>
>> Same.
>>
>>>> {
>>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> 	CT_DEBUG_DRIVER("CT: desc %p reset head=%u tail=%u\n",
>>>> 			desc, desc->head, desc->tail);
>>>> -	desc->head = 0;
>>>> -	desc->tail = 0;
>>>> +	ctb->head = desc->head = 0;
>>>> +	ctb->tail = desc->tail = 0;
>>>> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>>>> 	desc->is_in_error = 0;
>>>> }
>>>> @@ -220,7 +226,7 @@ static int ctch_enable(struct intel_guc *guc,
>>>> 	 */
>>>> 	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
>>>> 		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
>>>> -		guc_ct_buffer_desc_init(ctch->ctbs[i].desc,
>>>> +		guc_ct_buffer_desc_init(&ctch->ctbs[i],
>>>> 					base + PAGE_SIZE/4 * i + PAGE_SIZE/2,
>>>> 					PAGE_SIZE/4,
>>>> 					ctch->owner);
>>>> @@ -301,32 +307,16 @@ static int ctb_write(struct intel_guc_ct_buffer  
>>>> *ctb,
>>>> 		     bool want_response)
>>>> {
>>>> 	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> -	u32 head = desc->head / 4;	/* in dwords */
>>>> -	u32 tail = desc->tail / 4;	/* in dwords */
>>>> -	u32 size = desc->size / 4;	/* in dwords */
>>>> -	u32 used;			/* in dwords */
>>>> +	u32 tail = ctb->tail / 4;	/* in dwords */
>>>> +	u32 size = ctb->size / 4;	/* in dwords */
>>>> 	u32 header;
>>>> 	u32 *cmds = ctb->cmds;
>>>> 	unsigned int i;
>>>> -	GEM_BUG_ON(desc->size % 4);
>>>> -	GEM_BUG_ON(desc->head % 4);
>>>> -	GEM_BUG_ON(desc->tail % 4);
>>>> +	GEM_BUG_ON(ctb->size % 4);
>>>> +	GEM_BUG_ON(ctb->tail % 4);
>>>> 	GEM_BUG_ON(tail >= size);
>>>> -	/*
>>>> -	 * tail == head condition indicates empty. GuC FW does not support
>>>> -	 * using up the entire buffer to get tail == head meaning full.
>>>> -	 */
>>>> -	if (tail < head)
>>>> -		used = (size - head) + tail;
>>>> -	else
>>>> -		used = tail - head;
>>>> -
>>>> -	/* make sure there is a space including extra dw for the fence */
>>>> -	if (unlikely(used + len + 1 >= size))
>>>> -		return -ENOSPC;
>>>> -
>>>> 	/*
>>>> 	 * Write the message. The format is the following:
>>>> 	 * DW0: header (including action code)
>>>> @@ -354,15 +344,16 @@ static int ctb_write(struct intel_guc_ct_buffer  
>>>> *ctb,
>>>> 	}
>>>> 	/* now update desc tail (back in bytes) */
>>>> -	desc->tail = tail * 4;
>>>> -	GEM_BUG_ON(desc->tail > desc->size);
>>>> +	ctb->tail = desc->tail = tail * 4;
>>>> +	ctb->space -= (len + 1) * 4;
>>>> +	GEM_BUG_ON(ctb->tail > ctb->size);
>>>> 	return 0;
>>>> }
>>>> /**
>>>> * wait_for_ctb_desc_update - Wait for the CT buffer descriptor update.
>>>> - * @desc:	buffer descriptor
>>>> + * @ctb:	ctb buffer
>>>> * @fence:	response fence
>>>> * @status:	placeholder for status
>>>> *
>>>> @@ -376,11 +367,12 @@ static int ctb_write(struct intel_guc_ct_buffer  
>>>> *ctb,
>>>> * *	-ETIMEDOUT no response within hardcoded timeout
>>>> * *	-EPROTO no response, CT buffer is in error
>>>> */
>>>> -static int wait_for_ctb_desc_update(struct guc_ct_buffer_desc *desc,
>>>> +static int wait_for_ctb_desc_update(struct intel_guc_ct_buffer *ctb,
>>>> 				    u32 fence,
>>>> 				    u32 *status)
>>>> {
>>>> 	int err;
>>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> 	/*
>>>> 	 * Fast commands should complete in less than 10us, so sample quickly
>>>> @@ -401,7 +393,7 @@ static int wait_for_ctb_desc_update(struct  
>>>> guc_ct_buffer_desc *desc,
>>>> 			/* Something went wrong with the messaging, try to reset
>>>> 			 * the buffer and hope for the best
>>>> 			 */
>>>> -			guc_ct_buffer_desc_reset(desc);
>>>> +			guc_ct_buffer_desc_reset(ctb);
>>>> 			err = -EPROTO;
>>>> 		}
>>>> 	}
>>>> @@ -446,12 +438,17 @@ static int wait_for_ct_request_update(struct  
>>>> ct_request *req, u32 *status)
>>>> 	return err;
>>>> }
>>>> -static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, u32  
>>>> len)
>>>> +static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32  
>>>> len)
>>>> {
>>>> -	u32 head = READ_ONCE(desc->head);
>>>> +	u32 head;
>>>> 	u32 space;
>>>> -	space = CIRC_SPACE(desc->tail, head, desc->size);
>>>> +	if (ctb->space >= len)
>>>> +		return true;
>>>> +
>>>> +	head = READ_ONCE(ctb->desc->head);
>>>> +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
>>>> +	ctb->space = space;
>>>> 	return space >= len;
>>>> }
>>>> @@ -462,7 +459,6 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
>>>> {
>>>> 	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>>>> 	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>>> -	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> 	int err;
>>>> 	GEM_BUG_ON(!ctch->enabled);
>>>> @@ -470,7 +466,7 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
>>>> 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>> 	lockdep_assert_held(&ct->send_lock);
>>>> -	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>> +	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
>>>> 		ct->retry++;
>>>> 		if (ct->retry >= MAX_RETRY)
>>>> 			return -EDEADLK;
>>>> @@ -496,7 +492,6 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>> 		     u32 *status)
>>>> {
>>>> 	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>>> -	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> 	struct ct_request request;
>>>> 	unsigned long flags;
>>>> 	u32 fence;
>>>> @@ -514,7 +509,7 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>> 	 */
>>>> retry:
>>>> 	spin_lock_irqsave(&ct->send_lock, flags);
>>>> -	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>> +	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
>>>> 		spin_unlock_irqrestore(&ct->send_lock, flags);
>>>> 		ct->retry++;
>>>> 		if (ct->retry >= MAX_RETRY)
>>>> @@ -544,7 +539,7 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>> 	if (response_buf)
>>>> 		err = wait_for_ct_request_update(&request, status);
>>>> 	else
>>>> -		err = wait_for_ctb_desc_update(desc, fence, status);
>>>> +		err = wait_for_ctb_desc_update(ctb, fence, status);
>>>> 	if (unlikely(err))
>>>> 		goto unlink;
>>>> @@ -618,9 +613,9 @@ static inline bool ct_header_is_response(u32  
>>>> header)
>>>> static int ctb_read(struct intel_guc_ct_buffer *ctb, u32 *data)
>>>> {
>>>> 	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> -	u32 head = desc->head / 4;	/* in dwords */
>>>> +	u32 head = ctb->head / 4;	/* in dwords */
>>>> 	u32 tail = desc->tail / 4;	/* in dwords */
>>>> -	u32 size = desc->size / 4;	/* in dwords */
>>>> +	u32 size = ctb->size / 4;	/* in dwords */
>>>> 	u32 *cmds = ctb->cmds;
>>>> 	s32 available;			/* in dwords */
>>>> 	unsigned int len;
>>>> @@ -664,7 +659,7 @@ static int ctb_read(struct intel_guc_ct_buffer  
>>>> *ctb, u32 *data)
>>>> 	}
>>>> 	CT_DEBUG_DRIVER("CT: received %*ph\n", 4 * len, data);
>>>> -	desc->head = head * 4;
>>>> +	ctb->head = desc->head = head * 4;
>>>> 	return 0;
>>>> }
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h  
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> index bc670a796bd8..1bff4f0b91f7 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> @@ -29,10 +29,18 @@ struct intel_guc;
>>>> *
>>>> * @desc: pointer to the buffer descriptor
>>>> * @cmds: pointer to the commands buffer
>>>> + * @size: local shadow copy of size
>>>
>>> I would rather expect this as real fixed size,
>>> note that size is not expected to change
>>>
>>
>> Yes, it is fixed over the life of the CTB but not all CTBs need to be  
>> the same
>> size. e.g. The H2G & G2H may and likely will be different sizes with  
>> the new Guc
>> interface.
>>
>>>> + * @head: local shadow copy of head
>>>> + * @tail: local shadow copy of tail
>>>> + * @space: local shadow copy of space
>>>> */
>>>> struct intel_guc_ct_buffer {
>>>> 	struct guc_ct_buffer_desc *desc;
>>>> 	u32 *cmds;
>>>> +	u32 size;
>>>> +	u32 tail;
>>>> +	u32 head;
>>>> +	u32 space;
>>>> };
>>>> /** Represents pair of command transport buffers.
>>>
>>> Can we reorder this patch to be first in the series ?
>>>
>>> Michal
>
> Tried this and I think it makes more sense the way it is. The 'space'  
> value
> doesn't have a meaning before the non blocking call is introduced.

hmm, blocking ctb_write also needs to check free space, and it is already
doing that on its own, so there is need/usage for "space"

> Also it ends
> up changing a bunch of code that is then deleted in the non blocking  
> call patch.
> Better to leave it as is.

this patch as 2nd also modifies some lines already touched by 1st patch
but after reordering there will be more chances to merge this patch earlier

>
> Matt
>
>>> _______________________________________________
>>
>> Yes.
>>
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Intel-gfx] [PATCH 2/3] drm/i915/guc: Optimized CTB writes and reads
@ 2019-11-25 13:48           ` Michal Wajdeczko
  0 siblings, 0 replies; 36+ messages in thread
From: Michal Wajdeczko @ 2019-11-25 13:48 UTC (permalink / raw)
  To: Matthew Brost; +Cc: Intel-GFX

On Fri, 22 Nov 2019 02:29:59 +0100, Matthew Brost  
<matthew.brost@intel.com> wrote:

> On Thu, Nov 21, 2019 at 07:56:07AM -0800, Matthew Brost wrote:
>> On Thu, Nov 21, 2019 at 12:58:50PM +0100, Michal Wajdeczko wrote:
>>> On Thu, 21 Nov 2019 00:56:03 +0100, <John.C.Harrison@intel.com> wrote:
>>>
>>>> From: Matthew Brost <matthew.brost@intel.com>
>>>>
>>>> CTB writes are now in the path of command submission and should be
>>>> optimized for performance. Rather than reading CTB descriptor values
>>>> (e.g. head, tail, size) which could result in accesses across the PCIe
>>>> bus, store shadow local copies and only read/write the descriptor
>>>> values when absolutely necessary.
>>>>
>>>> Cc: John Harrison <john.c.harrison@intel.com>
>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>> ---
>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 79 +++++++++++------------
>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |  8 +++
>>>> 2 files changed, 45 insertions(+), 42 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c  
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> index e50d968b15d5..4d8a4c6afd71 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> @@ -68,23 +68,29 @@ static inline const char  
>>>> *guc_ct_buffer_type_to_str(u32 type)
>>>> 	}
>>>> }
>>>> -static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc,
>>>> +static void guc_ct_buffer_desc_init(struct intel_guc_ct_buffer *ctb,
>>>> 				    u32 cmds_addr, u32 size, u32 owner)
>>>
>>> as now this function takes ctb instead of desc, it should be renamed
>>> or make it separate from guc_ct_buffer_desc_init
>>>
>>
>> Yes, makes sense.
>>
>>>> {
>>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> 	CT_DEBUG_DRIVER("CT: desc %p init addr=%#x size=%u owner=%u\n",
>>>> 			desc, cmds_addr, size, owner);
>>>> 	memset(desc, 0, sizeof(*desc));
>>>> 	desc->addr = cmds_addr;
>>>> -	desc->size = size;
>>>> +	ctb->size = desc->size = size;
>>>> 	desc->owner = owner;
>>>> +	ctb->tail = 0;
>>>> +	ctb->head = 0;
>>>> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>>>> }
>>>> -static void guc_ct_buffer_desc_reset(struct guc_ct_buffer_desc *desc)
>>>> +static void guc_ct_buffer_desc_reset(struct intel_guc_ct_buffer *ctb)
>>>
>>> same here
>>>
>>
>> Same.
>>
>>>> {
>>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> 	CT_DEBUG_DRIVER("CT: desc %p reset head=%u tail=%u\n",
>>>> 			desc, desc->head, desc->tail);
>>>> -	desc->head = 0;
>>>> -	desc->tail = 0;
>>>> +	ctb->head = desc->head = 0;
>>>> +	ctb->tail = desc->tail = 0;
>>>> +	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
>>>> 	desc->is_in_error = 0;
>>>> }
>>>> @@ -220,7 +226,7 @@ static int ctch_enable(struct intel_guc *guc,
>>>> 	 */
>>>> 	for (i = 0; i < ARRAY_SIZE(ctch->ctbs); i++) {
>>>> 		GEM_BUG_ON((i != CTB_SEND) && (i != CTB_RECV));
>>>> -		guc_ct_buffer_desc_init(ctch->ctbs[i].desc,
>>>> +		guc_ct_buffer_desc_init(&ctch->ctbs[i],
>>>> 					base + PAGE_SIZE/4 * i + PAGE_SIZE/2,
>>>> 					PAGE_SIZE/4,
>>>> 					ctch->owner);
>>>> @@ -301,32 +307,16 @@ static int ctb_write(struct intel_guc_ct_buffer  
>>>> *ctb,
>>>> 		     bool want_response)
>>>> {
>>>> 	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> -	u32 head = desc->head / 4;	/* in dwords */
>>>> -	u32 tail = desc->tail / 4;	/* in dwords */
>>>> -	u32 size = desc->size / 4;	/* in dwords */
>>>> -	u32 used;			/* in dwords */
>>>> +	u32 tail = ctb->tail / 4;	/* in dwords */
>>>> +	u32 size = ctb->size / 4;	/* in dwords */
>>>> 	u32 header;
>>>> 	u32 *cmds = ctb->cmds;
>>>> 	unsigned int i;
>>>> -	GEM_BUG_ON(desc->size % 4);
>>>> -	GEM_BUG_ON(desc->head % 4);
>>>> -	GEM_BUG_ON(desc->tail % 4);
>>>> +	GEM_BUG_ON(ctb->size % 4);
>>>> +	GEM_BUG_ON(ctb->tail % 4);
>>>> 	GEM_BUG_ON(tail >= size);
>>>> -	/*
>>>> -	 * tail == head condition indicates empty. GuC FW does not support
>>>> -	 * using up the entire buffer to get tail == head meaning full.
>>>> -	 */
>>>> -	if (tail < head)
>>>> -		used = (size - head) + tail;
>>>> -	else
>>>> -		used = tail - head;
>>>> -
>>>> -	/* make sure there is a space including extra dw for the fence */
>>>> -	if (unlikely(used + len + 1 >= size))
>>>> -		return -ENOSPC;
>>>> -
>>>> 	/*
>>>> 	 * Write the message. The format is the following:
>>>> 	 * DW0: header (including action code)
>>>> @@ -354,15 +344,16 @@ static int ctb_write(struct intel_guc_ct_buffer  
>>>> *ctb,
>>>> 	}
>>>> 	/* now update desc tail (back in bytes) */
>>>> -	desc->tail = tail * 4;
>>>> -	GEM_BUG_ON(desc->tail > desc->size);
>>>> +	ctb->tail = desc->tail = tail * 4;
>>>> +	ctb->space -= (len + 1) * 4;
>>>> +	GEM_BUG_ON(ctb->tail > ctb->size);
>>>> 	return 0;
>>>> }
>>>> /**
>>>> * wait_for_ctb_desc_update - Wait for the CT buffer descriptor update.
>>>> - * @desc:	buffer descriptor
>>>> + * @ctb:	ctb buffer
>>>> * @fence:	response fence
>>>> * @status:	placeholder for status
>>>> *
>>>> @@ -376,11 +367,12 @@ static int ctb_write(struct intel_guc_ct_buffer  
>>>> *ctb,
>>>> * *	-ETIMEDOUT no response within hardcoded timeout
>>>> * *	-EPROTO no response, CT buffer is in error
>>>> */
>>>> -static int wait_for_ctb_desc_update(struct guc_ct_buffer_desc *desc,
>>>> +static int wait_for_ctb_desc_update(struct intel_guc_ct_buffer *ctb,
>>>> 				    u32 fence,
>>>> 				    u32 *status)
>>>> {
>>>> 	int err;
>>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> 	/*
>>>> 	 * Fast commands should complete in less than 10us, so sample quickly
>>>> @@ -401,7 +393,7 @@ static int wait_for_ctb_desc_update(struct  
>>>> guc_ct_buffer_desc *desc,
>>>> 			/* Something went wrong with the messaging, try to reset
>>>> 			 * the buffer and hope for the best
>>>> 			 */
>>>> -			guc_ct_buffer_desc_reset(desc);
>>>> +			guc_ct_buffer_desc_reset(ctb);
>>>> 			err = -EPROTO;
>>>> 		}
>>>> 	}
>>>> @@ -446,12 +438,17 @@ static int wait_for_ct_request_update(struct  
>>>> ct_request *req, u32 *status)
>>>> 	return err;
>>>> }
>>>> -static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, u32  
>>>> len)
>>>> +static inline bool ctb_has_room(struct intel_guc_ct_buffer *ctb, u32  
>>>> len)
>>>> {
>>>> -	u32 head = READ_ONCE(desc->head);
>>>> +	u32 head;
>>>> 	u32 space;
>>>> -	space = CIRC_SPACE(desc->tail, head, desc->size);
>>>> +	if (ctb->space >= len)
>>>> +		return true;
>>>> +
>>>> +	head = READ_ONCE(ctb->desc->head);
>>>> +	space = CIRC_SPACE(ctb->tail, head, ctb->size);
>>>> +	ctb->space = space;
>>>> 	return space >= len;
>>>> }
>>>> @@ -462,7 +459,6 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
>>>> {
>>>> 	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>>>> 	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>>> -	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> 	int err;
>>>> 	GEM_BUG_ON(!ctch->enabled);
>>>> @@ -470,7 +466,7 @@ int intel_guc_send_nb(struct intel_guc_ct *ct,
>>>> 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>> 	lockdep_assert_held(&ct->send_lock);
>>>> -	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>> +	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
>>>> 		ct->retry++;
>>>> 		if (ct->retry >= MAX_RETRY)
>>>> 			return -EDEADLK;
>>>> @@ -496,7 +492,6 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>> 		     u32 *status)
>>>> {
>>>> 	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>>> -	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> 	struct ct_request request;
>>>> 	unsigned long flags;
>>>> 	u32 fence;
>>>> @@ -514,7 +509,7 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>> 	 */
>>>> retry:
>>>> 	spin_lock_irqsave(&ct->send_lock, flags);
>>>> -	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>> +	if (unlikely(!ctb_has_room(ctb, (len + 1) * 4))) {
>>>> 		spin_unlock_irqrestore(&ct->send_lock, flags);
>>>> 		ct->retry++;
>>>> 		if (ct->retry >= MAX_RETRY)
>>>> @@ -544,7 +539,7 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>> 	if (response_buf)
>>>> 		err = wait_for_ct_request_update(&request, status);
>>>> 	else
>>>> -		err = wait_for_ctb_desc_update(desc, fence, status);
>>>> +		err = wait_for_ctb_desc_update(ctb, fence, status);
>>>> 	if (unlikely(err))
>>>> 		goto unlink;
>>>> @@ -618,9 +613,9 @@ static inline bool ct_header_is_response(u32  
>>>> header)
>>>> static int ctb_read(struct intel_guc_ct_buffer *ctb, u32 *data)
>>>> {
>>>> 	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> -	u32 head = desc->head / 4;	/* in dwords */
>>>> +	u32 head = ctb->head / 4;	/* in dwords */
>>>> 	u32 tail = desc->tail / 4;	/* in dwords */
>>>> -	u32 size = desc->size / 4;	/* in dwords */
>>>> +	u32 size = ctb->size / 4;	/* in dwords */
>>>> 	u32 *cmds = ctb->cmds;
>>>> 	s32 available;			/* in dwords */
>>>> 	unsigned int len;
>>>> @@ -664,7 +659,7 @@ static int ctb_read(struct intel_guc_ct_buffer  
>>>> *ctb, u32 *data)
>>>> 	}
>>>> 	CT_DEBUG_DRIVER("CT: received %*ph\n", 4 * len, data);
>>>> -	desc->head = head * 4;
>>>> +	ctb->head = desc->head = head * 4;
>>>> 	return 0;
>>>> }
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h  
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> index bc670a796bd8..1bff4f0b91f7 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> @@ -29,10 +29,18 @@ struct intel_guc;
>>>> *
>>>> * @desc: pointer to the buffer descriptor
>>>> * @cmds: pointer to the commands buffer
>>>> + * @size: local shadow copy of size
>>>
>>> I would rather expect this as real fixed size,
>>> note that size is not expected to change
>>>
>>
>> Yes, it is fixed over the life of the CTB but not all CTBs need to be  
>> the same
>> size. e.g. The H2G & G2H may and likely will be different sizes with  
>> the new Guc
>> interface.
>>
>>>> + * @head: local shadow copy of head
>>>> + * @tail: local shadow copy of tail
>>>> + * @space: local shadow copy of space
>>>> */
>>>> struct intel_guc_ct_buffer {
>>>> 	struct guc_ct_buffer_desc *desc;
>>>> 	u32 *cmds;
>>>> +	u32 size;
>>>> +	u32 tail;
>>>> +	u32 head;
>>>> +	u32 space;
>>>> };
>>>> /** Represents pair of command transport buffers.
>>>
>>> Can we reorder this patch to be first in the series ?
>>>
>>> Michal
>
> Tried this and I think it makes more sense the way it is. The 'space'  
> value
> doesn't have a meaning before the non blocking call is introduced.

hmm, blocking ctb_write also needs to check free space, and it is already
doing that on its own, so there is need/usage for "space"

> Also it ends
> up changing a bunch of code that is then deleted in the non blocking  
> call patch.
> Better to leave it as is.

this patch as 2nd also modifies some lines already touched by 1st patch
but after reordering there will be more chances to merge this patch earlier

>
> Matt
>
>>> _______________________________________________
>>
>> Yes.
>>
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2019-11-25 13:48 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-11-20 23:56 [PATCH 0/3] drm/i915/guc: CTB improvements John.C.Harrison
2019-11-20 23:56 ` [Intel-gfx] " John.C.Harrison
2019-11-20 23:56 ` [PATCH 1/3] drm/i915/guc: Add non blocking CTB send function John.C.Harrison
2019-11-20 23:56   ` [Intel-gfx] " John.C.Harrison
2019-11-21 11:43   ` Michal Wajdeczko
2019-11-21 11:43     ` [Intel-gfx] " Michal Wajdeczko
2019-11-22  0:13     ` Matthew Brost
2019-11-22  0:13       ` [Intel-gfx] " Matthew Brost
2019-11-22  1:34       ` Matthew Brost
2019-11-22  1:34         ` [Intel-gfx] " Matthew Brost
2019-11-25 13:39         ` Michal Wajdeczko
2019-11-25 13:39           ` [Intel-gfx] " Michal Wajdeczko
2019-11-25 13:20       ` Michal Wajdeczko
2019-11-25 13:20         ` [Intel-gfx] " Michal Wajdeczko
2019-11-20 23:56 ` [PATCH 2/3] drm/i915/guc: Optimized CTB writes and reads John.C.Harrison
2019-11-20 23:56   ` [Intel-gfx] " John.C.Harrison
2019-11-21 11:58   ` Michal Wajdeczko
2019-11-21 11:58     ` [Intel-gfx] " Michal Wajdeczko
2019-11-21 15:56     ` Matthew Brost
2019-11-21 15:56       ` [Intel-gfx] " Matthew Brost
2019-11-22  1:29       ` Matthew Brost
2019-11-22  1:29         ` [Intel-gfx] " Matthew Brost
2019-11-25 13:48         ` Michal Wajdeczko
2019-11-25 13:48           ` [Intel-gfx] " Michal Wajdeczko
2019-11-20 23:56 ` [PATCH 3/3] drm/i915/guc: Increase size of CTB buffers John.C.Harrison
2019-11-20 23:56   ` [Intel-gfx] " John.C.Harrison
2019-11-21 12:25   ` Michal Wajdeczko
2019-11-21 12:25     ` [Intel-gfx] " Michal Wajdeczko
2019-11-21 16:11     ` Matthew Brost
2019-11-21 16:11       ` [Intel-gfx] " Matthew Brost
2019-11-21  2:49 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915/guc: CTB improvements Patchwork
2019-11-21  2:49   ` [Intel-gfx] " Patchwork
2019-11-21  3:16 ` ✓ Fi.CI.BAT: success " Patchwork
2019-11-21  3:16   ` [Intel-gfx] " Patchwork
2019-11-22  4:45 ` ✓ Fi.CI.IGT: " Patchwork
2019-11-22  4:45   ` [Intel-gfx] " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.