[Intel-gfx] [PATCH v5 5/5] drm/i915/gt: Make sure that errors are propagated through request chains

public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed

From: Andi Shyti <andi.shyti@linux.intel.com>
To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	stable@vger.kernel.org
Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>,
	Andi Shyti <andi.shyti@kernel.org>,
	Matthew Auld <matthew.auld@intel.com>,
	Andrzej Hajda <andrzej.hajda@intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Chris Wilson <chris.p.wilson@linux.intel.com>,
	Nirmoy Das <nirmoy.das@intel.com>
Subject: [Intel-gfx] [PATCH v5 5/5] drm/i915/gt: Make sure that errors are propagated through request chains
Date: Wed, 12 Apr 2023 13:33:08 +0200	[thread overview]
Message-ID: <20230412113308.812468-6-andi.shyti@linux.intel.com> (raw)
In-Reply-To: <20230412113308.812468-1-andi.shyti@linux.intel.com>

Currently, when we perform operations such as clearing or copying
large blocks of memory, we generate multiple requests that are
executed in a chain.

However, if one of these requests fails, we may not realize it
unless it happens to be the last request in the chain. This is
because errors are not properly propagated.

For this we need to keep propagating the chain of fence
notification in order to always reach the final fence associated
to the final request.

To address this issue, we need to ensure that the chain of fence
notifications is always propagated so that we can reach the final
fence associated with the last request. By doing so, we will be
able to detect any memory operation  failures and determine
whether the memory is still invalid.

On copy and clear migration signal fences upon completion.

On copy and clear migration, signal fences upon request
completion to ensure that we have a reliable perpetuation of the
operation outcome.

Fixes: cf586021642d80 ("drm/i915/gt: Pipelined page migration")
Reported-by: Matthew Auld <matthew.auld@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 51 +++++++++++++++++++------
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 3f638f1987968..668c95af8cbcf 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -742,13 +742,19 @@ intel_context_migrate_copy(struct intel_context *ce,
 			dst_offset = 2 * CHUNK_SZ;
 	}
 
+	/*
+	 * While building the chain of requests, we need to ensure
+	 * that no one can sneak into the timeline unnoticed.
+	 */
+	mutex_lock(&ce->timeline->mutex);
+
 	do {
 		int len;
 
-		rq = i915_request_create(ce);
+		rq = i915_request_create_locked(ce);
 		if (IS_ERR(rq)) {
 			err = PTR_ERR(rq);
-			goto out_ce;
+			break;
 		}
 
 		if (deps) {
@@ -878,10 +884,14 @@ intel_context_migrate_copy(struct intel_context *ce,
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-		if (*out)
+		i915_sw_fence_await(&rq->submit);
+		i915_request_get(rq);
+		i915_request_add_locked(rq);
+		if (*out) {
+			i915_sw_fence_complete(&(*out)->submit);
 			i915_request_put(*out);
-		*out = i915_request_get(rq);
-		i915_request_add(rq);
+		}
+		*out = rq;
 
 		if (err)
 			break;
@@ -905,7 +915,10 @@ intel_context_migrate_copy(struct intel_context *ce,
 		cond_resched();
 	} while (1);
 
-out_ce:
+	mutex_unlock(&ce->timeline->mutex);
+
+	if (*out)
+		i915_sw_fence_complete(&(*out)->submit);
 	return err;
 }
 
@@ -999,13 +1012,19 @@ intel_context_migrate_clear(struct intel_context *ce,
 	if (HAS_64K_PAGES(i915) && is_lmem)
 		offset = CHUNK_SZ;
 
+	/*
+	 * While building the chain of requests, we need to ensure
+	 * that no one can sneak into the timeline unnoticed.
+	 */
+	mutex_lock(&ce->timeline->mutex);
+
 	do {
 		int len;
 
-		rq = i915_request_create(ce);
+		rq = i915_request_create_locked(ce);
 		if (IS_ERR(rq)) {
 			err = PTR_ERR(rq);
-			goto out_ce;
+			break;
 		}
 
 		if (deps) {
@@ -1056,17 +1075,25 @@ intel_context_migrate_clear(struct intel_context *ce,
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-		if (*out)
+		i915_sw_fence_await(&rq->submit);
+		i915_request_get(rq);
+		i915_request_add_locked(rq);
+		if (*out) {
+			i915_sw_fence_complete(&(*out)->submit);
 			i915_request_put(*out);
-		*out = i915_request_get(rq);
-		i915_request_add(rq);
+		}
+		*out = rq;
+
 		if (err || !it.sg || !sg_dma_len(it.sg))
 			break;
 
 		cond_resched();
 	} while (1);
 
-out_ce:
+	mutex_unlock(&ce->timeline->mutex);
+
+	if (*out)
+		i915_sw_fence_complete(&(*out)->submit);
 	return err;
 }
 
-- 
2.39.2

next prev parent reply	other threads:[~2023-04-12 11:34 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-12 11:33 [Intel-gfx] [PATCH v5 0/5] Fix error propagation amongst request Andi Shyti
2023-04-12 11:33 ` [Intel-gfx] [PATCH v5 1/5] drm/i915/gt: Add intel_context_timeline_is_locked helper Andi Shyti
2023-04-12 12:12   ` Andrzej Hajda
2023-04-12 11:33 ` [Intel-gfx] [PATCH v5 2/5] drm/i915: Create the locked version of the request create Andi Shyti
2023-04-12 13:04   ` Andrzej Hajda
2023-04-13  8:53     ` Andi Shyti
2023-04-12 11:33 ` [Intel-gfx] [PATCH v5 3/5] drm/i915: Create the locked version of the request add Andi Shyti
2023-04-12 13:06   ` Andrzej Hajda
2023-04-13  8:57     ` Andi Shyti
2023-04-12 11:33 ` [Intel-gfx] [PATCH v5 4/5] drm/i915: Throttle for ringspace prior to taking the timeline mutex Andi Shyti
2023-04-12 11:33 ` Andi Shyti [this message]
2023-04-13 11:56   ` [Intel-gfx] [PATCH v5 5/5] drm/i915/gt: Make sure that errors are propagated through request chains Tvrtko Ursulin
2023-04-13 12:43     ` Tvrtko Ursulin
2023-05-04  9:45       ` Andi Shyti
2023-05-04  9:37     ` Andi Shyti
2023-04-12 17:28 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Fix error propagation amongst request (rev3) Patchwork
2023-04-12 17:28 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2023-04-12 17:41 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-04-13  3:47 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:3f638f198796 dfblob:668c95af8cbc )
 OR (
bs:"[Intel-gfx] [PATCH v5 5/5] drm/i915/gt: Make sure that errors are propagated through request chains" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230412113308.812468-6-andi.shyti@linux.intel.com \
    --to=andi.shyti@linux.intel.com \
    --cc=andi.shyti@kernel.org \
    --cc=andrzej.hajda@intel.com \
    --cc=chris.p.wilson@linux.intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=maciej.patelczyk@intel.com \
    --cc=matthew.auld@intel.com \
    --cc=nirmoy.das@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox