Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/7] Only timeout jobs if they run longer than timeout period
@ 2024-06-10 13:50 Matthew Brost
  2024-06-10 13:50 ` [PATCH v4 1/7] drm/xe: Add ctx timestamp to LRC snapshot Matthew Brost
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Matthew Brost @ 2024-06-10 13:50 UTC (permalink / raw)
  To: intel-xe

Debugging [1] hit a known flaw in the job timeout mechanism - jobs
timeout after a period of time in which they have been submitted to the
GuC not how long they have actually been running on the hardware.
Attempt to fix this.

Algorithm is as follows:
- Copy ctx timestamp from LRC to saved location at beginning of every
  job
- On TDR kick jobs off hardware via schedule disable so ctx timestamp is
  updated
- Compare ctx timestamp to saved ctx timestamp, if jobs having been
  running less than timeout period re-enable scheduling are restart TDR

New job cancel IGT [2] for testing.

v2:
- Promote to non-RFC as issues which I view as blockers have been resolved
- Address Jani and Michal v1 feedback
- Add GT clock timer calculation
v3:
- More testing
- Fix TDR state machine bugs exposed in testing
- Rebase for CI
v4:
- Address a few comments by John H
- Fix CI failure [3]

Matt

[1] https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/799
[2] https://patchwork.freedesktop.org/series/134640/
[3] https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-134642v1/shard-dg2-433/igt@xe_exec_threads@threads-hang-fd-rebind.html

Matthew Brost (7):
  drm/xe: Add ctx timestamp to LRC snapshot
  drm/xe: Add xe_gt_clock_interval_to_ms helper
  drm/xe: Improve unexpected state error messages
  drm/xe: Add GuC state asserts to deregister_exec_queue
  drm/xe: Add pending disable assert to handle_sched_done
  drm/xe: Add killed, banned, or wedged as stick bit during GuC reset
  drm/xe: Sample ctx timestamp to determine if jobs have timed out

 drivers/gpu/drm/xe/xe_gt_clock.c   |  18 ++
 drivers/gpu/drm/xe/xe_gt_clock.h   |   1 +
 drivers/gpu/drm/xe/xe_guc_submit.c | 316 +++++++++++++++++++++++------
 drivers/gpu/drm/xe/xe_lrc.c        |   6 +
 4 files changed, 277 insertions(+), 64 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/7] drm/xe: Add ctx timestamp to LRC snapshot
  2024-06-10 13:50 [PATCH v4 0/7] Only timeout jobs if they run longer than timeout period Matthew Brost
@ 2024-06-10 13:50 ` Matthew Brost
  2024-06-10 13:50 ` [PATCH v4 2/7] drm/xe: Add xe_gt_clock_interval_to_ms helper Matthew Brost
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Matthew Brost @ 2024-06-10 13:50 UTC (permalink / raw)
  To: intel-xe

The ctx timestamp is useful information, add to LRC snapshot.

v2:
  - s/ctx_timestamp_job/ctx_job_timestamp

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_lrc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index 0fef354c6489..f3d77aa1750d 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -49,6 +49,8 @@ struct xe_lrc_snapshot {
 	} tail;
 	u32 start_seqno;
 	u32 seqno;
+	u32 ctx_timestamp;
+	u32 ctx_job_timestamp;
 };
 
 static struct xe_device *
@@ -1642,6 +1644,8 @@ struct xe_lrc_snapshot *xe_lrc_snapshot_capture(struct xe_lrc *lrc)
 	snapshot->lrc_offset = xe_lrc_pphwsp_offset(lrc);
 	snapshot->lrc_size = lrc->bo->size - snapshot->lrc_offset;
 	snapshot->lrc_snapshot = NULL;
+	snapshot->ctx_timestamp = xe_lrc_ctx_timestamp(lrc);
+	snapshot->ctx_job_timestamp = xe_lrc_ctx_job_timestamp(lrc);
 	return snapshot;
 }
 
@@ -1690,6 +1694,8 @@ void xe_lrc_snapshot_print(struct xe_lrc_snapshot *snapshot, struct drm_printer
 		   snapshot->tail.internal, snapshot->tail.memory);
 	drm_printf(p, "\tStart seqno: (memory) %d\n", snapshot->start_seqno);
 	drm_printf(p, "\tSeqno: (memory) %d\n", snapshot->seqno);
+	drm_printf(p, "\tTimestamp: 0x%08x\n", snapshot->ctx_timestamp);
+	drm_printf(p, "\tJob Timestamp: 0x%08x\n", snapshot->ctx_job_timestamp);
 
 	if (!snapshot->lrc_snapshot)
 		return;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 2/7] drm/xe: Add xe_gt_clock_interval_to_ms helper
  2024-06-10 13:50 [PATCH v4 0/7] Only timeout jobs if they run longer than timeout period Matthew Brost
  2024-06-10 13:50 ` [PATCH v4 1/7] drm/xe: Add ctx timestamp to LRC snapshot Matthew Brost
@ 2024-06-10 13:50 ` Matthew Brost
  2024-06-10 13:50 ` [PATCH v4 3/7] drm/xe: Improve unexpected state error messages Matthew Brost
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Matthew Brost @ 2024-06-10 13:50 UTC (permalink / raw)
  To: intel-xe

Add helper to convert GT clock ticks to msec. Useful for determining if
timeouts occur by examing GT clock ticks.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_clock.c | 18 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_gt_clock.h |  1 +
 2 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt_clock.c b/drivers/gpu/drm/xe/xe_gt_clock.c
index 9ff2061133df..a9392a743fd5 100644
--- a/drivers/gpu/drm/xe/xe_gt_clock.c
+++ b/drivers/gpu/drm/xe/xe_gt_clock.c
@@ -79,3 +79,21 @@ int xe_gt_clock_init(struct xe_gt *gt)
 	gt->info.reference_clock = freq;
 	return 0;
 }
+
+static u64 div_u64_roundup(u64 nom, u32 den)
+{
+	return div_u64(nom + den - 1, den);
+}
+
+/**
+ * xe_gt_clock_interval_to_ms - Convert sampled GT clock ticks to msec
+ *
+ * @gt: the &xe_gt
+ * @count: count of GT clock ticks
+ *
+ * Returns: time in msec
+ */
+u64 xe_gt_clock_interval_to_ms(struct xe_gt *gt, u64 count)
+{
+	return div_u64_roundup(count * MSEC_PER_SEC, gt->info.reference_clock);
+}
diff --git a/drivers/gpu/drm/xe/xe_gt_clock.h b/drivers/gpu/drm/xe/xe_gt_clock.h
index 44fa0371b973..3adeb7baaca4 100644
--- a/drivers/gpu/drm/xe/xe_gt_clock.h
+++ b/drivers/gpu/drm/xe/xe_gt_clock.h
@@ -11,5 +11,6 @@
 struct xe_gt;
 
 int xe_gt_clock_init(struct xe_gt *gt);
+u64 xe_gt_clock_interval_to_ms(struct xe_gt *gt, u64 count);
 
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 3/7] drm/xe: Improve unexpected state error messages
  2024-06-10 13:50 [PATCH v4 0/7] Only timeout jobs if they run longer than timeout period Matthew Brost
  2024-06-10 13:50 ` [PATCH v4 1/7] drm/xe: Add ctx timestamp to LRC snapshot Matthew Brost
  2024-06-10 13:50 ` [PATCH v4 2/7] drm/xe: Add xe_gt_clock_interval_to_ms helper Matthew Brost
@ 2024-06-10 13:50 ` Matthew Brost
  2024-06-10 13:50 ` [PATCH v4 4/7] drm/xe: Add GuC state asserts to deregister_exec_queue Matthew Brost
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Matthew Brost @ 2024-06-10 13:50 UTC (permalink / raw)
  To: intel-xe

Include G2H handler name when an unexpected error state messages.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 4464ba337d12..766ff8e48dde 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1632,8 +1632,8 @@ int xe_guc_sched_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 
 	if (unlikely(!exec_queue_pending_enable(q) &&
 		     !exec_queue_pending_disable(q))) {
-		drm_err(&xe->drm, "Unexpected engine state 0x%04x",
-			atomic_read(&q->guc->state));
+		drm_err(&xe->drm, "SCHED_DONE: Unexpected engine state 0x%04x, guc_id=%d",
+			atomic_read(&q->guc->state), q->guc->id);
 		return -EPROTO;
 	}
 
@@ -1671,8 +1671,8 @@ int xe_guc_deregister_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 
 	if (!exec_queue_destroyed(q) || exec_queue_pending_disable(q) ||
 	    exec_queue_pending_enable(q) || exec_queue_enabled(q)) {
-		drm_err(&xe->drm, "Unexpected engine state 0x%04x",
-			atomic_read(&q->guc->state));
+		drm_err(&xe->drm, "DEREGISTER_DONE: Unexpected engine state 0x%04x, guc_id=%d",
+			atomic_read(&q->guc->state), q->guc->id);
 		return -EPROTO;
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 4/7] drm/xe: Add GuC state asserts to deregister_exec_queue
  2024-06-10 13:50 [PATCH v4 0/7] Only timeout jobs if they run longer than timeout period Matthew Brost
                   ` (2 preceding siblings ...)
  2024-06-10 13:50 ` [PATCH v4 3/7] drm/xe: Improve unexpected state error messages Matthew Brost
@ 2024-06-10 13:50 ` Matthew Brost
  2024-06-10 13:50 ` [PATCH v4 5/7] drm/xe: Add pending disable assert to handle_sched_done Matthew Brost
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Matthew Brost @ 2024-06-10 13:50 UTC (permalink / raw)
  To: intel-xe

Will help catch bugs in GuC state machine.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 766ff8e48dde..e697732f1e74 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1587,6 +1587,11 @@ static void deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
 		q->guc->id,
 	};
 
+	xe_gt_assert(guc_to_gt(guc), exec_queue_destroyed(q));
+	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
+	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
+	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_enable(q));
+
 	trace_xe_exec_queue_deregister(q);
 
 	xe_guc_ct_send_g2h_handler(&guc->ct, action, ARRAY_SIZE(action));
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 5/7] drm/xe: Add pending disable assert to handle_sched_done
  2024-06-10 13:50 [PATCH v4 0/7] Only timeout jobs if they run longer than timeout period Matthew Brost
                   ` (3 preceding siblings ...)
  2024-06-10 13:50 ` [PATCH v4 4/7] drm/xe: Add GuC state asserts to deregister_exec_queue Matthew Brost
@ 2024-06-10 13:50 ` Matthew Brost
  2024-06-10 13:50 ` [PATCH v4 6/7] drm/xe: Add killed, banned, or wedged as stick bit during GuC reset Matthew Brost
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Matthew Brost @ 2024-06-10 13:50 UTC (permalink / raw)
  To: intel-xe

Will help catch bugs in GuC state machine.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index e697732f1e74..9e1535c2d8f3 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1607,6 +1607,8 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q)
 		smp_wmb();
 		wake_up_all(&guc->ct.wq);
 	} else {
+		xe_gt_assert(guc_to_gt(guc), exec_queue_pending_disable(q));
+
 		clear_exec_queue_pending_disable(q);
 		if (q->guc->suspend_pending) {
 			suspend_fence_signal(q);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 6/7] drm/xe: Add killed, banned, or wedged as stick bit during GuC reset
  2024-06-10 13:50 [PATCH v4 0/7] Only timeout jobs if they run longer than timeout period Matthew Brost
                   ` (4 preceding siblings ...)
  2024-06-10 13:50 ` [PATCH v4 5/7] drm/xe: Add pending disable assert to handle_sched_done Matthew Brost
@ 2024-06-10 13:50 ` Matthew Brost
  2024-06-10 13:50 ` [PATCH v4 7/7] drm/xe: Sample ctx timestamp to determine if jobs have timed out Matthew Brost
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Matthew Brost @ 2024-06-10 13:50 UTC (permalink / raw)
  To: intel-xe

These bits should be persistent across reset, treat them as such.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 9e1535c2d8f3..3db0aa40535d 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1444,7 +1444,9 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
 		set_exec_queue_suspended(q);
 		suspend_fence_signal(q);
 	}
-	atomic_and(EXEC_QUEUE_STATE_DESTROYED | EXEC_QUEUE_STATE_SUSPENDED,
+	atomic_and(EXEC_QUEUE_STATE_WEDGED | EXEC_QUEUE_STATE_BANNED |
+		   EXEC_QUEUE_STATE_KILLED | EXEC_QUEUE_STATE_DESTROYED |
+		   EXEC_QUEUE_STATE_SUSPENDED,
 		   &q->guc->state);
 	q->guc->resume_time = 0;
 	trace_xe_exec_queue_stop(q);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 7/7] drm/xe: Sample ctx timestamp to determine if jobs have timed out
  2024-06-10 13:50 [PATCH v4 0/7] Only timeout jobs if they run longer than timeout period Matthew Brost
                   ` (5 preceding siblings ...)
  2024-06-10 13:50 ` [PATCH v4 6/7] drm/xe: Add killed, banned, or wedged as stick bit during GuC reset Matthew Brost
@ 2024-06-10 13:50 ` Matthew Brost
  2024-06-10 13:54 ` ✓ CI.Patch_applied: success for Only timeout jobs if they run longer than timeout period Patchwork
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Matthew Brost @ 2024-06-10 13:50 UTC (permalink / raw)
  To: intel-xe

In GuC TDR sample ctx timestamp to determine if jobs have timed out. The
scheduling enable needs to be toggled to properly sample the timestamp.
If a job has not been running for longer than the timeout period,
re-enable scheduling and restart the TDR.

v2:
 - Use GT clock to msec helper (Umesh, off list)
 - s/ctx_timestamp_job/ctx_job_timestamp
v3:
 - Fix state machine for TDR, mainly decouple sched disable and
   deregister (testing)
 - Rebase (CI)
v4:
 - Fix checkpatch && newline issue (CI)
 - Do not deregister on wedged or unregistered (CI)
 - Fix refcounting bugs (CI)
 - Move devcoredump above VM / kernel job check (John H)
 - Add comment for check_timeout state usage (John H)
 - Assert pending disable not inflight when enabling scheduling (John H)
 - Use enable_scheduling in other scheduling enable code (John H)
 - Add comments on a few steps in TDR (John H)
 - Add assert for timestamp overflow protection (John H)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 297 +++++++++++++++++++++++------
 1 file changed, 238 insertions(+), 59 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 3db0aa40535d..8daf4e076df4 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -23,6 +23,7 @@
 #include "xe_force_wake.h"
 #include "xe_gpu_scheduler.h"
 #include "xe_gt.h"
+#include "xe_gt_clock.h"
 #include "xe_gt_printk.h"
 #include "xe_guc.h"
 #include "xe_guc_ct.h"
@@ -62,6 +63,8 @@ exec_queue_to_guc(struct xe_exec_queue *q)
 #define EXEC_QUEUE_STATE_KILLED			(1 << 7)
 #define EXEC_QUEUE_STATE_WEDGED			(1 << 8)
 #define EXEC_QUEUE_STATE_BANNED			(1 << 9)
+#define EXEC_QUEUE_STATE_CHECK_TIMEOUT		(1 << 10)
+#define EXEC_QUEUE_STATE_EXTRA_REF		(1 << 11)
 
 static bool exec_queue_registered(struct xe_exec_queue *q)
 {
@@ -188,6 +191,31 @@ static void set_exec_queue_wedged(struct xe_exec_queue *q)
 	atomic_or(EXEC_QUEUE_STATE_WEDGED, &q->guc->state);
 }
 
+static bool exec_queue_check_timeout(struct xe_exec_queue *q)
+{
+	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_CHECK_TIMEOUT;
+}
+
+static void set_exec_queue_check_timeout(struct xe_exec_queue *q)
+{
+	atomic_or(EXEC_QUEUE_STATE_CHECK_TIMEOUT, &q->guc->state);
+}
+
+static void clear_exec_queue_check_timeout(struct xe_exec_queue *q)
+{
+	atomic_and(~EXEC_QUEUE_STATE_CHECK_TIMEOUT, &q->guc->state);
+}
+
+static bool exec_queue_extra_ref(struct xe_exec_queue *q)
+{
+	return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_EXTRA_REF;
+}
+
+static void set_exec_queue_extra_ref(struct xe_exec_queue *q)
+{
+	atomic_or(EXEC_QUEUE_STATE_EXTRA_REF, &q->guc->state);
+}
+
 static bool exec_queue_killed_or_banned_or_wedged(struct xe_exec_queue *q)
 {
 	return (atomic_read(&q->guc->state) &
@@ -920,6 +948,107 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
 	xe_sched_submission_start(sched);
 }
 
+#define ADJUST_FIVE_PERCENT(__t)	(((__t) * 105) / 100)
+
+static bool check_timeout(struct xe_exec_queue *q, struct xe_sched_job *job)
+{
+	struct xe_gt *gt = guc_to_gt(exec_queue_to_guc(q));
+	u32 ctx_timestamp = xe_lrc_ctx_timestamp(q->lrc[0]);
+	u32 ctx_job_timestamp = xe_lrc_ctx_job_timestamp(q->lrc[0]);
+	u32 timeout_ms = q->sched_props.job_timeout_ms;
+	u32 diff, running_time_ms;
+
+	/*
+	 * Counter wraps at ~223s at the usual 19.2MHz, be paranoid catch
+	 * possible overflows with a high timeout.
+	 */
+	xe_gt_assert(gt, timeout_ms < 100 * MSEC_PER_SEC);
+
+	if (ctx_timestamp < ctx_job_timestamp)
+		diff = ctx_timestamp + U32_MAX - ctx_job_timestamp;
+	else
+		diff = ctx_timestamp - ctx_job_timestamp;
+
+	/*
+	 * Ensure timeout is within 5% to account for an GuC scheduling latency
+	 */
+	running_time_ms =
+		ADJUST_FIVE_PERCENT(xe_gt_clock_interval_to_ms(gt, diff));
+
+	drm_info(&guc_to_xe(exec_queue_to_guc(q))->drm,
+		 "Check job timeout: seqno=%u, lrc_seqno=%u, guc_id=%d, running_time_ms=%u, timeout_ms=%u, diff=0x%08x",
+		 xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
+		 q->guc->id, running_time_ms, timeout_ms, diff);
+
+	return running_time_ms >= timeout_ms;
+}
+
+static void enable_scheduling(struct xe_exec_queue *q)
+{
+	MAKE_SCHED_CONTEXT_ACTION(q, ENABLE);
+	struct xe_guc *guc = exec_queue_to_guc(q);
+	struct xe_device *xe = guc_to_xe(guc);
+	int ret;
+
+	xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q));
+	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
+	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
+	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_enable(q));
+
+	set_exec_queue_pending_enable(q);
+	set_exec_queue_enabled(q);
+	trace_xe_exec_queue_scheduling_enable(q);
+
+	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
+		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
+
+	ret = wait_event_timeout(guc->ct.wq,
+				 !exec_queue_pending_enable(q) ||
+				 guc_read_stopped(guc), HZ * 5);
+	if (!ret || guc_read_stopped(guc)) {
+		drm_warn(&xe->drm, "Schedule enable failed to respond");
+		set_exec_queue_banned(q);
+		xe_gt_reset_async(q->gt);
+		xe_sched_tdr_queue_imm(&q->guc->sched);
+	}
+}
+
+static void disable_scheduling(struct xe_exec_queue *q)
+{
+	MAKE_SCHED_CONTEXT_ACTION(q, DISABLE);
+	struct xe_guc *guc = exec_queue_to_guc(q);
+
+	xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q));
+	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
+	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
+
+	clear_exec_queue_enabled(q);
+	set_exec_queue_pending_disable(q);
+	trace_xe_exec_queue_scheduling_disable(q);
+
+	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
+		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
+}
+
+static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
+{
+	u32 action[] = {
+		XE_GUC_ACTION_DEREGISTER_CONTEXT,
+		q->guc->id,
+	};
+
+	xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q));
+	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
+	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_enable(q));
+	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
+
+	set_exec_queue_destroyed(q);
+	trace_xe_exec_queue_deregister(q);
+
+	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
+		       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
+}
+
 static enum drm_gpu_sched_stat
 guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 {
@@ -928,9 +1057,10 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	struct xe_exec_queue *q = job->q;
 	struct xe_gpu_scheduler *sched = &q->guc->sched;
 	struct xe_device *xe = guc_to_xe(exec_queue_to_guc(q));
+	struct xe_guc *guc = exec_queue_to_guc(q);
 	int err = -ETIME;
 	int i = 0;
-	bool wedged;
+	bool wedged, skip_timeout_check;
 
 	/*
 	 * TDR has fired before free job worker. Common if exec queue
@@ -942,49 +1072,53 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 		return DRM_GPU_SCHED_STAT_NOMINAL;
 	}
 
-	drm_notice(&xe->drm, "Timedout job: seqno=%u, lrc_seqno=%u, guc_id=%d, flags=0x%lx",
-		   xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
-		   q->guc->id, q->flags);
-	xe_gt_WARN(q->gt, q->flags & EXEC_QUEUE_FLAG_KERNEL,
-		   "Kernel-submitted job timed out\n");
-	xe_gt_WARN(q->gt, q->flags & EXEC_QUEUE_FLAG_VM && !exec_queue_killed(q),
-		   "VM job timed out on non-killed execqueue\n");
-
-	if (!exec_queue_killed(q))
-		xe_devcoredump(job);
-
-	trace_xe_sched_job_timedout(job);
-
-	wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
-
 	/* Kill the run_job entry point */
 	xe_sched_submission_stop(sched);
 
+	/* Must check all state after stopping scheduler */
+	skip_timeout_check = exec_queue_reset(q) ||
+		exec_queue_killed_or_banned_or_wedged(q) ||
+		exec_queue_destroyed(q);
+
+	/* Job hasn't started, can't be timed out */
+	if (!skip_timeout_check && !xe_sched_job_started(job))
+		goto rearm;
+
 	/*
-	 * Kernel jobs should never fail, nor should VM jobs if they do
-	 * somethings has gone wrong and the GT needs a reset
+	 * XXX: Sampling timeout doesn't work in wedged mode as we have to
+	 * modify scheduling state to read timestamp. We could read the
+	 * timestamp from a register to accumulate current running time but this
+	 * doesn't work for SRIOV. For now assuming timeouts in wedged mode are
+	 * genuine timeouts.
 	 */
-	if (!wedged && (q->flags & EXEC_QUEUE_FLAG_KERNEL ||
-			(q->flags & EXEC_QUEUE_FLAG_VM && !exec_queue_killed(q)))) {
-		if (!xe_sched_invalidate_job(job, 2)) {
-			xe_sched_add_pending_job(sched, job);
-			xe_sched_submission_start(sched);
-			xe_gt_reset_async(q->gt);
-			goto out;
-		}
-	}
+	wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
 
-	/* Engine state now stable, disable scheduling if needed */
+	/* Engine state now stable, disable scheduling to check timestamp */
 	if (!wedged && exec_queue_registered(q)) {
-		struct xe_guc *guc = exec_queue_to_guc(q);
 		int ret;
 
 		if (exec_queue_reset(q))
 			err = -EIO;
-		set_exec_queue_banned(q);
+
 		if (!exec_queue_destroyed(q)) {
-			xe_exec_queue_get(q);
-			disable_scheduling_deregister(guc, q);
+			/*
+			 * Wait for any pending G2H to flush out before
+			 * modifying state
+			 */
+			ret = wait_event_timeout(guc->ct.wq,
+						 !exec_queue_pending_enable(q) ||
+						 guc_read_stopped(guc), HZ * 5);
+			if (!ret || guc_read_stopped(guc))
+				goto trigger_reset;
+
+			/*
+			 * Flag communicates to G2H handler that schedule
+			 * disable originated from a timeout check. The G2H then
+			 * avoid triggering cleanup or deregistering the exec
+			 * queue.
+			 */
+			set_exec_queue_check_timeout(q);
+			disable_scheduling(q);
 		}
 
 		/*
@@ -1000,15 +1134,61 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 					 !exec_queue_pending_disable(q) ||
 					 guc_read_stopped(guc), HZ * 5);
 		if (!ret || guc_read_stopped(guc)) {
+trigger_reset:
 			drm_warn(&xe->drm, "Schedule disable failed to respond");
-			xe_sched_add_pending_job(sched, job);
-			xe_sched_submission_start(sched);
+			clear_exec_queue_check_timeout(q);
+			set_exec_queue_extra_ref(q);
+			xe_exec_queue_get(q);	/* GT reset owns this */
+			set_exec_queue_banned(q);
 			xe_gt_reset_async(q->gt);
 			xe_sched_tdr_queue_imm(sched);
-			goto out;
+			goto rearm;
 		}
 	}
 
+	/*
+	 * Check if job is actually timed out, if restart job execution and TDR
+	 */
+	if (!wedged && !skip_timeout_check && !check_timeout(q, job) &&
+	    !exec_queue_reset(q) && exec_queue_registered(q)) {
+		clear_exec_queue_check_timeout(q);
+		goto sched_enable;
+	}
+
+	drm_notice(&xe->drm, "Timedout job: seqno=%u, lrc_seqno=%u, guc_id=%d, flags=0x%lx",
+		   xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
+		   q->guc->id, q->flags);
+	xe_gt_WARN(q->gt, q->flags & EXEC_QUEUE_FLAG_KERNEL,
+		   "Kernel-submitted job timed out\n");
+	xe_gt_WARN(q->gt, q->flags & EXEC_QUEUE_FLAG_VM && !exec_queue_killed(q),
+		   "VM job timed out on non-killed execqueue\n");
+
+	trace_xe_sched_job_timedout(job);
+
+	if (!exec_queue_killed(q))
+		xe_devcoredump(job);
+
+	/*
+	 * Kernel jobs should never fail, nor should VM jobs if they do
+	 * somethings has gone wrong and the GT needs a reset
+	 */
+	if (!wedged && (q->flags & EXEC_QUEUE_FLAG_KERNEL ||
+			(q->flags & EXEC_QUEUE_FLAG_VM && !exec_queue_killed(q)))) {
+		if (!xe_sched_invalidate_job(job, 2)) {
+			clear_exec_queue_check_timeout(q);
+			xe_gt_reset_async(q->gt);
+			goto rearm;
+		}
+	}
+
+	/* Finish cleaning up exec queue via deregister */
+	set_exec_queue_banned(q);
+	if (!wedged && exec_queue_registered(q) && !exec_queue_destroyed(q)) {
+		set_exec_queue_extra_ref(q);
+		xe_exec_queue_get(q);
+		__deregister_exec_queue(guc, q);
+	}
+
 	/* Stop fence signaling */
 	xe_hw_fence_irq_stop(q->fence_irq);
 
@@ -1030,7 +1210,19 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	/* Start fence signaling */
 	xe_hw_fence_irq_start(q->fence_irq);
 
-out:
+	return DRM_GPU_SCHED_STAT_NOMINAL;
+
+sched_enable:
+	enable_scheduling(q);
+rearm:
+	/*
+	 * XXX: Ideally want to adjust timeout based on current exection time
+	 * but there is not currently an easy way to do in DRM scheduler. With
+	 * some thought, do this in a follow up.
+	 */
+	xe_sched_add_pending_job(sched, job);
+	xe_sched_submission_start(sched);
+
 	return DRM_GPU_SCHED_STAT_NOMINAL;
 }
 
@@ -1133,7 +1325,6 @@ static void __guc_exec_queue_process_msg_suspend(struct xe_sched_msg *msg)
 			   guc_read_stopped(guc));
 
 		if (!guc_read_stopped(guc)) {
-			MAKE_SCHED_CONTEXT_ACTION(q, DISABLE);
 			s64 since_resume_ms =
 				ktime_ms_delta(ktime_get(),
 					       q->guc->resume_time);
@@ -1144,12 +1335,7 @@ static void __guc_exec_queue_process_msg_suspend(struct xe_sched_msg *msg)
 				msleep(wait_ms);
 
 			set_exec_queue_suspended(q);
-			clear_exec_queue_enabled(q);
-			set_exec_queue_pending_disable(q);
-			trace_xe_exec_queue_scheduling_disable(q);
-
-			xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
-				       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
+			disable_scheduling(q);
 		}
 	} else if (q->guc->suspend_pending) {
 		set_exec_queue_suspended(q);
@@ -1160,19 +1346,11 @@ static void __guc_exec_queue_process_msg_suspend(struct xe_sched_msg *msg)
 static void __guc_exec_queue_process_msg_resume(struct xe_sched_msg *msg)
 {
 	struct xe_exec_queue *q = msg->private_data;
-	struct xe_guc *guc = exec_queue_to_guc(q);
 
 	if (guc_exec_queue_allowed_to_change_state(q)) {
-		MAKE_SCHED_CONTEXT_ACTION(q, ENABLE);
-
 		q->guc->resume_time = RESUME_PENDING;
 		clear_exec_queue_suspended(q);
-		set_exec_queue_pending_enable(q);
-		set_exec_queue_enabled(q);
-		trace_xe_exec_queue_scheduling_enable(q);
-
-		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
-			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
+		enable_scheduling(q);
 	} else {
 		clear_exec_queue_suspended(q);
 	}
@@ -1434,8 +1612,7 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
 
 	/* Clean up lost G2H + reset engine state */
 	if (exec_queue_registered(q)) {
-		if ((exec_queue_banned(q) && exec_queue_destroyed(q)) ||
-		    xe_exec_queue_is_lr(q))
+		if (exec_queue_extra_ref(q) || xe_exec_queue_is_lr(q))
 			xe_exec_queue_put(q);
 		else if (exec_queue_destroyed(q))
 			__guc_exec_queue_fini(guc, q);
@@ -1615,11 +1792,13 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q)
 		if (q->guc->suspend_pending) {
 			suspend_fence_signal(q);
 		} else {
-			if (exec_queue_banned(q)) {
+			if (exec_queue_banned(q) ||
+			    exec_queue_check_timeout(q)) {
 				smp_wmb();
 				wake_up_all(&guc->ct.wq);
 			}
-			deregister_exec_queue(guc, q);
+			if (!exec_queue_check_timeout(q))
+				deregister_exec_queue(guc, q);
 		}
 	}
 }
@@ -1657,7 +1836,7 @@ static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q)
 
 	clear_exec_queue_registered(q);
 
-	if (exec_queue_banned(q) || xe_exec_queue_is_lr(q))
+	if (exec_queue_extra_ref(q) || xe_exec_queue_is_lr(q))
 		xe_exec_queue_put(q);
 	else
 		__guc_exec_queue_fini(guc, q);
@@ -1720,7 +1899,7 @@ int xe_guc_exec_queue_reset_handler(struct xe_guc *guc, u32 *msg, u32 len)
 	 * guc_exec_queue_timedout_job.
 	 */
 	set_exec_queue_reset(q);
-	if (!exec_queue_banned(q))
+	if (!exec_queue_banned(q) && !exec_queue_check_timeout(q))
 		xe_guc_exec_queue_trigger_cleanup(q);
 
 	return 0;
@@ -1750,7 +1929,7 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
 
 	/* Treat the same as engine reset */
 	set_exec_queue_reset(q);
-	if (!exec_queue_banned(q))
+	if (!exec_queue_banned(q) && !exec_queue_check_timeout(q))
 		xe_guc_exec_queue_trigger_cleanup(q);
 
 	return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* ✓ CI.Patch_applied: success for Only timeout jobs if they run longer than timeout period
  2024-06-10 13:50 [PATCH v4 0/7] Only timeout jobs if they run longer than timeout period Matthew Brost
                   ` (6 preceding siblings ...)
  2024-06-10 13:50 ` [PATCH v4 7/7] drm/xe: Sample ctx timestamp to determine if jobs have timed out Matthew Brost
@ 2024-06-10 13:54 ` Patchwork
  2024-06-10 13:54 ` ✗ CI.checkpatch: warning " Patchwork
  2024-06-10 13:55 ` ✗ CI.KUnit: failure " Patchwork
  9 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2024-06-10 13:54 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Only timeout jobs if they run longer than timeout period
URL   : https://patchwork.freedesktop.org/series/134677/
State : success

== Summary ==

=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: 183cef86631f drm-tip: 2024y-06m-10d-12h-51m-07s UTC integration manifest
=== git am output follows ===
Applying: drm/xe: Add ctx timestamp to LRC snapshot
Applying: drm/xe: Add xe_gt_clock_interval_to_ms helper
Applying: drm/xe: Improve unexpected state error messages
Applying: drm/xe: Add GuC state asserts to deregister_exec_queue
Applying: drm/xe: Add pending disable assert to handle_sched_done
Applying: drm/xe: Add killed, banned, or wedged as stick bit during GuC reset
Applying: drm/xe: Sample ctx timestamp to determine if jobs have timed out



^ permalink raw reply	[flat|nested] 11+ messages in thread

* ✗ CI.checkpatch: warning for Only timeout jobs if they run longer than timeout period
  2024-06-10 13:50 [PATCH v4 0/7] Only timeout jobs if they run longer than timeout period Matthew Brost
                   ` (7 preceding siblings ...)
  2024-06-10 13:54 ` ✓ CI.Patch_applied: success for Only timeout jobs if they run longer than timeout period Patchwork
@ 2024-06-10 13:54 ` Patchwork
  2024-06-10 13:55 ` ✗ CI.KUnit: failure " Patchwork
  9 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2024-06-10 13:54 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Only timeout jobs if they run longer than timeout period
URL   : https://patchwork.freedesktop.org/series/134677/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
51ce9f6cd981d42d7467409d7dbc559a450abc1e
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit ec39258838418f072f5bd72919282e074cd36aea
Author: Matthew Brost <matthew.brost@intel.com>
Date:   Mon Jun 10 06:50:11 2024 -0700

    drm/xe: Sample ctx timestamp to determine if jobs have timed out
    
    In GuC TDR sample ctx timestamp to determine if jobs have timed out. The
    scheduling enable needs to be toggled to properly sample the timestamp.
    If a job has not been running for longer than the timeout period,
    re-enable scheduling and restart the TDR.
    
    v2:
     - Use GT clock to msec helper (Umesh, off list)
     - s/ctx_timestamp_job/ctx_job_timestamp
    v3:
     - Fix state machine for TDR, mainly decouple sched disable and
       deregister (testing)
     - Rebase (CI)
    v4:
     - Fix checkpatch && newline issue (CI)
     - Do not deregister on wedged or unregistered (CI)
     - Fix refcounting bugs (CI)
     - Move devcoredump above VM / kernel job check (John H)
     - Add comment for check_timeout state usage (John H)
     - Assert pending disable not inflight when enabling scheduling (John H)
     - Use enable_scheduling in other scheduling enable code (John H)
     - Add comments on a few steps in TDR (John H)
     - Add assert for timestamp overflow protection (John H)
    
    Signed-off-by: Matthew Brost <matthew.brost@intel.com>
+ /mt/dim checkpatch 183cef86631f1ea60930365f10582bb963a790f1 drm-intel
66a84e18ab11 drm/xe: Add ctx timestamp to LRC snapshot
7e7d4d60f5d6 drm/xe: Add xe_gt_clock_interval_to_ms helper
de7afa65bac3 drm/xe: Improve unexpected state error messages
53624491cb1e drm/xe: Add GuC state asserts to deregister_exec_queue
137b80c80e9b drm/xe: Add pending disable assert to handle_sched_done
2d43a417f2b9 drm/xe: Add killed, banned, or wedged as stick bit during GuC reset
ec3925883841 drm/xe: Sample ctx timestamp to determine if jobs have timed out
-:89: CHECK:SPACING: No space is necessary after a cast
#89: FILE: drivers/gpu/drm/xe/xe_guc_submit.c:951:
+#define ADJUST_FIVE_PERCENT(__t)	(((__t) * 105) / 100)

-:89: ERROR:SPACING: space prohibited after that '*' (ctx:WxW)
#89: FILE: drivers/gpu/drm/xe/xe_guc_submit.c:951:
+#define ADJUST_FIVE_PERCENT(__t)	(((__t) * 105) / 100)
                                 	        ^

total: 1 errors, 0 warnings, 1 checks, 420 lines checked



^ permalink raw reply	[flat|nested] 11+ messages in thread

* ✗ CI.KUnit: failure for Only timeout jobs if they run longer than timeout period
  2024-06-10 13:50 [PATCH v4 0/7] Only timeout jobs if they run longer than timeout period Matthew Brost
                   ` (8 preceding siblings ...)
  2024-06-10 13:54 ` ✗ CI.checkpatch: warning " Patchwork
@ 2024-06-10 13:55 ` Patchwork
  9 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2024-06-10 13:55 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Only timeout jobs if they run longer than timeout period
URL   : https://patchwork.freedesktop.org/series/134677/
State : failure

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
ERROR:root:../drivers/gpu/drm/xe/xe_guc_submit.c: In function ‘check_timeout’:
../drivers/gpu/drm/xe/xe_guc_submit.c:956:22: error: implicit declaration of function ‘xe_lrc_ctx_timestamp’; did you mean ‘xe_lrc_update_timestamp’? [-Werror=implicit-function-declaration]
  956 |  u32 ctx_timestamp = xe_lrc_ctx_timestamp(q->lrc[0]);
      |                      ^~~~~~~~~~~~~~~~~~~~
      |                      xe_lrc_update_timestamp
../drivers/gpu/drm/xe/xe_guc_submit.c:957:26: error: implicit declaration of function ‘xe_lrc_ctx_job_timestamp’; did you mean ‘xe_lrc_update_timestamp’? [-Werror=implicit-function-declaration]
  957 |  u32 ctx_job_timestamp = xe_lrc_ctx_job_timestamp(q->lrc[0]);
      |                          ^~~~~~~~~~~~~~~~~~~~~~~~
      |                          xe_lrc_update_timestamp
cc1: some warnings being treated as errors
make[7]: *** [../scripts/Makefile.build:244: drivers/gpu/drm/xe/xe_guc_submit.o] Error 1
make[7]: *** Waiting for unfinished jobs....
../lib/iomap.c:156:5: warning: no previous prototype for ‘ioread64_lo_hi’ [-Wmissing-prototypes]
  156 | u64 ioread64_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:163:5: warning: no previous prototype for ‘ioread64_hi_lo’ [-Wmissing-prototypes]
  163 | u64 ioread64_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:170:5: warning: no previous prototype for ‘ioread64be_lo_hi’ [-Wmissing-prototypes]
  170 | u64 ioread64be_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:178:5: warning: no previous prototype for ‘ioread64be_hi_lo’ [-Wmissing-prototypes]
  178 | u64 ioread64be_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:264:6: warning: no previous prototype for ‘iowrite64_lo_hi’ [-Wmissing-prototypes]
  264 | void iowrite64_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:272:6: warning: no previous prototype for ‘iowrite64_hi_lo’ [-Wmissing-prototypes]
  272 | void iowrite64_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:280:6: warning: no previous prototype for ‘iowrite64be_lo_hi’ [-Wmissing-prototypes]
  280 | void iowrite64be_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
../lib/iomap.c:288:6: warning: no previous prototype for ‘iowrite64be_hi_lo’ [-Wmissing-prototypes]
  288 | void iowrite64be_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_lrc.c: In function ‘xe_lrc_snapshot_capture’:
../drivers/gpu/drm/xe/xe_lrc.c:1581:28: error: implicit declaration of function ‘xe_lrc_ctx_timestamp’; did you mean ‘xe_lrc_update_timestamp’? [-Werror=implicit-function-declaration]
 1581 |  snapshot->ctx_timestamp = xe_lrc_ctx_timestamp(lrc);
      |                            ^~~~~~~~~~~~~~~~~~~~
      |                            xe_lrc_update_timestamp
../drivers/gpu/drm/xe/xe_lrc.c:1582:32: error: implicit declaration of function ‘xe_lrc_ctx_job_timestamp’; did you mean ‘xe_lrc_update_timestamp’? [-Werror=implicit-function-declaration]
 1582 |  snapshot->ctx_job_timestamp = xe_lrc_ctx_job_timestamp(lrc);
      |                                ^~~~~~~~~~~~~~~~~~~~~~~~
      |                                xe_lrc_update_timestamp
cc1: some warnings being treated as errors
make[7]: *** [../scripts/Makefile.build:244: drivers/gpu/drm/xe/xe_lrc.o] Error 1
make[6]: *** [../scripts/Makefile.build:485: drivers/gpu/drm/xe] Error 2
make[5]: *** [../scripts/Makefile.build:485: drivers/gpu/drm] Error 2
make[4]: *** [../scripts/Makefile.build:485: drivers/gpu] Error 2
make[3]: *** [../scripts/Makefile.build:485: drivers] Error 2
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [/kernel/Makefile:1934: .] Error 2
make[1]: *** [/kernel/Makefile:240: __sub-make] Error 2
make: *** [Makefile:240: __sub-make] Error 2

[13:54:42] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[13:54:46] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make ARCH=um O=.kunit --jobs=48
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-06-10 13:55 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-10 13:50 [PATCH v4 0/7] Only timeout jobs if they run longer than timeout period Matthew Brost
2024-06-10 13:50 ` [PATCH v4 1/7] drm/xe: Add ctx timestamp to LRC snapshot Matthew Brost
2024-06-10 13:50 ` [PATCH v4 2/7] drm/xe: Add xe_gt_clock_interval_to_ms helper Matthew Brost
2024-06-10 13:50 ` [PATCH v4 3/7] drm/xe: Improve unexpected state error messages Matthew Brost
2024-06-10 13:50 ` [PATCH v4 4/7] drm/xe: Add GuC state asserts to deregister_exec_queue Matthew Brost
2024-06-10 13:50 ` [PATCH v4 5/7] drm/xe: Add pending disable assert to handle_sched_done Matthew Brost
2024-06-10 13:50 ` [PATCH v4 6/7] drm/xe: Add killed, banned, or wedged as stick bit during GuC reset Matthew Brost
2024-06-10 13:50 ` [PATCH v4 7/7] drm/xe: Sample ctx timestamp to determine if jobs have timed out Matthew Brost
2024-06-10 13:54 ` ✓ CI.Patch_applied: success for Only timeout jobs if they run longer than timeout period Patchwork
2024-06-10 13:54 ` ✗ CI.checkpatch: warning " Patchwork
2024-06-10 13:55 ` ✗ CI.KUnit: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox